Introduction

Prioritisation can largely be broken into two steps – geographic prioritisation and beneficiary selection.

This document examines data from the FAO-WFP Shocks, Agricultural Livelihoods and Food Security Survey, the Armed Conflict Location and Event Dataset (ACLED) and the Myanmar Information Management Unit (MIMU).

The first section deals with the development of a beneficiary profile using the FAO-WFP survey data to inform beneficiary selection. The second section makes use of ACLED data for Myanmar to develop a conflict score for township prioritisation. Township and beneficiary selection criteria are linked by a decision tree. The third section presents flood risks by township – these are only probabilities and serve as references for prepositioning and disaster risk reduction. They not included in the overall prioritisation process, though this might change if disasters occur. This document is closed by a section on technical notes.


References for this report

  • ACLED (2022). ACLED data for Myanmar (2010-2022). https://acleddata.com.
  • Maria Noel Pi Alperin, Philippe Van Kerm (2009). mdepriv: Synthetic indicators of multiple deprivation. Stata command. http://medim.ceps.lu/stata/mdepriv_v3.pdf
  • Atillio Benini, Aldo Benini (2021). mdepriv: Synthetic scores of multiple deprivation. R package version 0.0.3. https://github.com/a-benini/mdepriv/.
  • Gianni Betti, Vijay Verma (2004). A methodology for the study of multi-dimensional and longitudinal aspects of poverty and deprivation. Dipartamento di Metodi Quantitativi, Universita di Siena. http://repec.deps.unisi.it/quaderni/49DMQ.pdf.
  • FAO and WFP (2022). Myanmar | Shocks, agricultural livelihoods and food security: Monitoring report, March 2022. Rome.
  • Food Security Cluster, Myanmar (2021). 5Ws reporting tool.
  • Food Security Cluster, Myanmar (2022). Understanding Conflict Dynamics in Myanmar through Conflict and Incident Data: A Food Security Perspective. https://food-security-cluster-myanmar.github.io/exploratory-data-analysis-acled-fsc/.
  • HARP-F and MIMU (2018). Vulnerability in Myanmar: A Secondary Data Review of Needs, Coverage and Gaps. http://themimu.info/vulnerability-in-myanmar.
  • IFPRI (2022). Household Welfare in Myanmar: Results from the Myanmar Household Welfare Survey.
  • MIMU (2022). Climate, Environmental Degradation and Disaster Risk in Myanmar.
  • UNHCR (2022). Pre and Post 1 Feb 2021 IDPs Population by Township level.
  • WFP Nigeria Country Office (2018). Standard Operating Procedure: Beneficiary Targeting in North Eastern Nigeria. Directive: NCO/PU/2019/04.




1. Beneficiary profile, using FAO-WFP survey data

A total of 2,708 households were interviewed in the FAO-WFP Shocks, Agricultural Livelihoods and Food Security Survey in Ayeyarwady, Chin, Kachin, Kayah, Kayin, Mon, Rakhine, Shan and Yangon to monitor food security and livelihoods of persons in Myanmar. Some of the most key data collected were on the various food security indices – the Food Consumption Score (FCS), the Food Insecurity Experience Scale (FIES) and the Livelihood Coping Strategies Index (CSI). Much more detail can be gleaned from the report itself – this document focuses only on the aspects of the survey which can be used for prioritisation. Information was also collected on basic household demographics as well on income sources.



1.1 Overview of the food security indicators

A composite of the three indices, the Food Insecurity Score has also been calculated. The construction of this composite score is detailed in section 4.2. But going forward, this document will use the Food Insecurity Score as a proxy for food insecurity itself.

The plots below ranks variables in order of their ability to predict the various food insecurity indices. Only environmental or demographic variables were taken into account as it would be redundant to predict food security indices using the components of said indices. Only responses that occurred more than 200 times (out of 2,708) in the survey were considered.

Variables that are more red are most likely to result in high food insecurity scores (that is, higher food insecurity) and variables that are more blue are more likely to result in lower food insecurity scores. The dotted red line down the middle at \(x=0\) splits variables into whether the predict higher food insecurity (positive numbers) or lower food insecurity (negative numbers).

For all these plots, the scores of the food security indices have all been scaled between 0 and 1, with higher scores indicating greater food insecurity. The coefficient estimates below show how much a positive response on these variables would increase or decrease the score on each of the indices.



The variables that tend to to contribute the most to high food insecurity scores are a household’s exposure to conflict and natural hazards, whether or not they lost employment or work opportunities, whether or not they were engaged in casual non-agricultural labour and whether or not they reported no income in the three months prior to the survey.

These variables can be thought of as a basic beneficiary profile that can inform beneficiary selection criteria for food security and livelihoods programming. This list is, of course, non-exhaustive as it is limited to only the variables that were collected as part of the FAO-WFP survey.

The variables that were the best predictors of low food insecurity were whether or not the head of household had completed higher education, whether or not the main income source of the household was stable non-agricultural employment.



1.2 Setting a threshold for prioritisation

An important step is narrowing down the population to a priority group – Food Security Cluster partners do not have the resources to target all persons.

A household would be considered a priority for intervention when above the 75th percentile in at least two out of the three food security indices (food consumption score, coping strategies index and food insecurity experience scale)

Approximately 34% of the sampled population falls into this priority group. The plots below shows the differences between the average values of food security index components based on whether they are priority households or not. Variables in blue correspond to Coping Strategies Index indicators, those in red to the Food Insecurity Experience Scale.

Those in green correspond to the Food Consumption Score. For these indicators, they have been rescaled so that a value of 1 shows that a household has not consumed that food group at all in the previous 7 days and a value of 0 shows that they have consumed that food group every day for the past 7 days.



Whilst both groups tend to consume low amounts of dairy, pulses, sugar and fruits, these trends are much more pronounced in the priority group. Additionally, households in the priority group are more than twice as likely to have poor protein consumption, be worried about not having enough food and eat less due to a lack of resources.

Households in the priority group were also more than four times as likely to have reduced healthcare expenditures, have sold productive assets and be unable to eat healthy and nutritious foods and eat only a few kinds of foods. They are also more than five times as likely to skipped meals or gone hungry.


1.3 Variables considered in the beneficiary profile

The interactive reference table below shows the various environmental and demographic variables collected by the FAO-WFP survey. The first column lists the variable name, the second and third show the mean values for each variable based on households inside and outside of the priority group – for instance, from the table, it can be seen that 2.96% of households within the priority group do not have improved sanitation whereas for households outside of the priority group, this percentage is 1.42%.

The fourth column shows the percentage difference between households in the priority group and households outside of it. To continue the example, households in the priority group are 107.55% more likely to not have access to improved sanitation than households outside of the priority group. The table is sorted in descending order based on this column.

The last column is the percentage of the surveyed population that the variable corresponds to. So, in spite of households in the priority group having much lower access to improved sanitation, this variable only applies to 2.23% of the total population, meaning that whilst this indicator might be very predictive of households in the priority group, using it on its own would lead to high exclusion errors as it only applies to a small proportion of the population. Sorting the table by %_sample.



As expected, given that the priority grouping is largely based on the Food Consumption Score, Coping Strategies Index and the Food Insecurity Experience Scale, the variables that are least likely (as can be seen by their negative %_difference) to be associated with households in the priority group are the household head having completed higher education, having a stable source of non-agricultural income, having the main income source be from livestock and having self-employment or non-work (rents etc.) being the main income source.

Partners are encouraged to make use of the list of variables above, in addition to community perspectives and their own expert judgement, to formulate the beneficiary selection criteria for their food security programmes. The next section makes use of decision trees to find which of these variables are the most useful in splitting the population into households within the priority group and those outside of it.



1.4 Decision trees

This section focuses on the creation of simple tools to aid in the identification of households in the priority group. This involves narrowing down the list of 36 variables in the reference table above to develop two simple decision trees, one focused on a household’s environment and circumstances and the other on a household’s demographics.

Both these trees have been kept simple as making them deeper (i.e. by including additional variables and more nodes) provides very limited gains in exchange for a lot of added complexity. Additionally, as decision trees grow larger, they get much less interpretable.

Each tree starts at the root, node [1] in this case, which contains the entire population. From there, it branches out based on the various conditions at each node until it terminates in four groups (4, 5, 6 and 7 in the tree below). Within each blue square, the figure in bold show the percentage of each group that are priority households and the figures in italics show the percentage of the overall population in each node.

The first tree, dealing with a household’s environment and circumstances is comprised of three splits – whether a household is in a rural area, whether they have experienced violence, insecurity or conflict and whether they have lost work or employment opportunities.


Decision tree – environment and circumstances

The percentage of priority households in rural areas is much higher than in urban ones (37.6% vs 26.1%). For rural areas, the indicator that best distinguishes priority households is whether or they have directly experienced violence, insecurity or conflict in the past three months. The factor that best distinguishes priority households in urban areas is whether or not they have lost work or employment opportunities in the past three months.

The performance of the decision tree is not exactly spectacular – if a partner wanted to target an entire community, the threshold for blanket coverage would be around 75% or 80% of households being in the priority group, which none of the terminal nodes achieve. But the tree is very understandable at least provides some direction for identifying beneficiaries at the community level once geographic prioritisation has been conducted (which is the topic of section 2).

These findings are also consistent with IFPRI’s survey on household welfare where rural households were found to be more asset-poor and less resilient than urban ones. However, much more work remains to be done on triangulating the FAO-WFP survey with the IFPRI one, which was much more comprehensive in scope and scale.

The second tree below makes use of demographic variables (age of head of household, whether or not the head of the household is female etc.). These have been treated separately from the environmental and circumstantial variables above as combining them led to trees that did not make sense i.e. whether or not a household has children under 5 should have no bearing on whether or not they lost access to markets in the past three months. Both trees start from the same root i.e. 100% of the population.

In the second tree below, which splits the population into three terminal nodes (B, D and E), households whose heads have not completed secondary and/or higher education and have children under 5 years of age are most likely to fall into the priority group (46.8%). Households whose heads have competed secondary and/or higher education are least likely to fall into the priority group, irrespective of the age of their children or even if they have children.


Decision tree – demographics

These decision trees have been constructed not to direct decisions in partners’ programmes, but more to inform them, and should not be treated as absolutes – there are a wide variety of contexts in Myanmar and each community will likely have additional relevant criteria.

But the application of the conditions set in this tree and the first one (in blue) will lead to much lower inclusion and exclusion errors in beneficiary selection. The Food Security Cluster remains open to more data that will help improve our collective understanding of food insecurity in Myanmar.



1.5 Exceptions for targeting agricultural households

Returning momentarily to the first tree on environmental and circumstantial variables (in blue), 64.7% of the total population is in the tree node [6] (the nodes are all numbered in the decision tree above) – within this group are households who reside in rural areas but have not been affected by violence, conflict or insecurity. Given that this is a significant chunk of the population where, with reference to the plot below, 78% of the population is involved in agriculture, an exception can be made in this one instance, to support partners who might want to select beneficiaries within the large group of agricultural households who have not been affected by conflict.

It must still be mentioned that aid must first flow to the households in node [7], where the proportion of households in the priority group is highest. Only after the needs of this population are met should households in node [6] be targeted.



The tree below carries on from node [6] of the main decision tree on environmental and circumstantial variables. Within this group of rural households not affected by conflict (64.7% of the total population), identifying households who have lost access to markets due to conflict, infrastructure damage or other access issues and households who have been affected by natural hazards (the most common of which is flooding) improves the chances of targeting households in the priority group to slightly above 50%.

Any further conditions applied on the tree only result in extremely marginal gains.


Decision tree – from tree node [6]


To further inform the selection of beneficiaries in these areas, the plot below shows the coefficients of environmental and demographic variables on food insecurity within the households in node [6]. The more positive (more red) the coefficient, the higher the impact on food insecurity.

The variables within the FAO-WFP survey most of use to partners targeting priority households in node [6] are whether or not a household has improved sanitation and the educational attainment of the head of the household. The most important shocks for this group were natural hazards and the loss of work or employment opportunities.




1.6 Impact of conflict on food security

Conflict has a real impact on food security. Exposure to conflict in these households was linked with a 0.032 increase in score, which is, on average, a 42.46% higher.

Though the FAO-WFP survey was not developed with measuring the impact of conflict in mind, it is nevertheless possible to control for some factors and examine the impact of conflict on several groups.

The plot below shows the impact on the overall food insecurity scores based on whether a household has been impacted by conflict or not. We have selected the two largest groups within the rural and urban context (farmers and workers employed in stable, non-agricultural employment) and have split these populations by education level.



The food insecurity scores of all groups except one are higher when a household has experienced conflict. This by no means constitutes a scientific proof (another survey would have to be conducted), but it is possible to state with some confidence that exposure to conflict has a negative impact on food security. The small sample size likely results in widely diverging values in some cases; additionally the purpose of the FAO-WFP survey was not to understand conflict dynamics.

The plot below further examines of the impact of conflict on food security. The estimates in the plot below show the impact of exposure to conflict on the various food security indices. As was done above, the food security indices used were re-scaled between 0 and 1, with higher scores indicating greater food insecurity.

Notably, exposure to conflict had the largest impact on the Food Insecurity Experience Scale in rural areas. All the effects were statistically significant with the exception of conflict on Food Consumption Scores in urban contexts (that point is marked by a triangle in the plot below to denote this).



Approximately 6% of all the households surveyed by FAO-WFP were directly affected by conflict, violence and insecurity. However, this does discount households that have been affected by the secondary impacts of conflict, such as loss of access to markets and increased food prices.

Whilst there is no set threshold whereby one can state that everyone in a given area has been affected by conflict, it is possible to determine where the conflict has been the most severe. In the next section, conflict scores, together with pre-existing vulnerability will be used to prioritise between townships.




2. Prioritisation of townships by exposure to conflict and pre-existing vulnerability

Myanmar had the most conflict events out of any country in 2021, exceeding Syria, Yemen and Afghanistan.


Increase in conflict events and fatalities compared to previous decade
years total_events avg_events_per_year total_fatalities avg_fatalities_per_year
2010-2020 8,510 774 9,451 859
2021 10,961 10,961 11,012 11,012
2022 5,873 5,873 9,383 9,383
Data source: ACLED; accleddata.com. Data for 2022 is until 2022-05-31.


Given the more than fourteenfold increase in conflict events and thirteenfold increase in conflict fatalities in Myanmar in 2021 (compared to the average for the preceding 10 years), conflict will be major part of determining which areas have the greatest humanitarian needs. However, scoring townships based only on one variable would lead to a very uni-dimensional understanding of the crisis.

Similar to how a beneficiary profile was established in order to inform beneficiary selection, this section, will focus on what variables should be considered when prioritising amongst the various geographic areas in Myanmar and how the results of such a prioritisation should be applied. As with earlier sections, the analysis here is meant to describe and inform programmatic and operational decision making, as opposed to dictating it, as each partner will have their own considerations and limitations.

For a more comprehensive review of conflict in Myanmar, please refer to the Food Security Cluster’s report Understanding Conflict Dynamics in Myanmar through Conflict and Incident Data: A Food Security Perspective. And for a more comprehensive review of multidimensional vulnerabilty in Myanmar, please refer to MIMU-HARP’s Vulnerability Study.



2.1 Conflict events and fatalities by state/region


Starting at the state/region level, Sagaing saw the highest number of conflict events as well as conflict as well as conflict-related fatalities in 2012. It experienced more than three times as many conflict-related fatalities than the next-highest state/region – Magway. This is a significant shift in the pattern of conflict in Myanmar, which has traditionally revolved around Kachin, Rakhine and Shan.

Kayah, Chin and Sagaing had the highest number of conflict fatalities per capita in 2021.




2.2 Conflict score by township

As can be seem from the state/region barplots in the previous section, the distribution of conflict events and fatalities is not even, and is skewed towards a few states/regions. This is also evident at township level. In the scatter plot below, the averages of the number of conflict events and the number of fatalities at the township level have been marked by the dotted red lines, dividing the plot into four quadrants.

To better discriminate amongst the numerous townships, a conflict score has been developed. A requisite for any prioritisation score or index for conflict should be the ability to distinguish, first and foremost, the townships in the upper right quadrant of the plot, which have the heaviest concentrations of conflict events and fatalities. These areas are also where, as we have established in the section on beneficiary selection, where the highest concentrations of households in the priority group will be found. For reference, 58 townships have both above-average numbers of conflict events and fatalities (upper-right quadrant) and 196 townships have both below-average numbers of conflict events and fatalities (bottom-left quadrant).



At its most basic, the conflict score is just the average of battles, explosions, remote violence, violence against civilians, strategic developments, non-peaceful protests and riots, conflict-related fatalities and IDPs (refer to section 4.5 for more details). Additional information can also be found in the FSC’s report Understanding Conflict Dynamics in Myanmar through Conflict and Incident Data: A Food Security Perspective. This score will be now be used as a shorthand for conflict incidence in Myanmar going forward.



2.3 Comparison with multidimensional vulnerability

Conflict, in spite of its out-sized role in the crisis in Myanmar, cannot serve as the only determinant of geographic prioritisation. In order to consider other factors, this document will make use of the MIMU-HARP Vulnerability Index, which is comprised of 8 indicators, selected for their ability to predict the rest of the variables in the 2015 Census dataset. Several of these variables were also collected at district-level and were used to create an updated 2019 Vulnerability index.

The specific indicators chosen were:

  • % of population without formal identification documents
  • % of population without a middle school education
  • % of females who were illiterate
  • % of households with bamboo or thatched roofs
  • % of households with safe sanitation
  • % of households with access to electricity
  • % of workers who are unpaid family workers
  • The child dependency ratio
  • The conflict index (reflecting battles, fatalities, violence against civilians and displacement)

These indicators were combined to construct a township-level vulnerability index. Of the indicators above, only the percentage of the population without a middle-school education and percentage of the population without formal identification documents were not collected in the 2019 Inter-Censal survey.

For the remaining indicators, they were bootstrapped using their 2015 township values and the 2019 district-level data. Going forward, the first seven components of the index will be extracted and treated separately as a shorthand for pre-existing multidimensional vulnerability in Myanmar.

However, the eighth component of the vulnerability index, the conflict index, first constructed in 2015 from ACLED data, can and has been updated with more recent data – this was done in the previous section. This next few sections will examine the relationship between underdevelopment and the updated 2021 conflict score and the resulting implications on geographic prioritisation for the Food Security Cluster.

Firstly, from the scatter plot below, underdevelopment and pre-existing vulnerability and conflict are not good predictors of each other. The points in the plot below, each representing a township, split largely across two arcs – with townships with high conflict scores largely falling around the median for multidimensional vulnerability and townships with very high multidimensional vulnerability tending to have lower conflict scores.



This is a reflection of the shift in conflict away from frontier areas to cities and towns that have more strategic targets. The broader involvement of the Bamar majority in the conflict has likely also contributed to this shift – prior to the February 2021 coup, the conflict was largely between Ethnic Armed Organisations and the Myanmar Armed Forces. However, combatants are now spread throughout the populations, especially with the proliferation of the People’s Defence Forces.

It has been observed that Ethnic Armed Organisations, such as the Kachin Independence Army, are working closely with the People’s Defence Forces, and are training and arming them. This has contributed to the front being moved further forward, outside of the areas that had been traditionally most affected by conflict.



2.4 2015 vulnerability bands

This shift in conflict patterns can be further examined by considering the vulnerability bands developed by MIMU-HARP alongside its vulnerability index. In addition to an overall vulnerability score per township, 8 main typologies of vulnerability were also constructed to illustrate the wide variation of contexts and needs in the different parts of the country as well as to group together similar townships so that they may be considered as separate programming blocs. Please refer to the MIMU-HARP Vulnerability Study for more details.

According to MIMU-HARP, the result of this grouping is a “lens allowing the most vulnerable to be considered more methodically in policy and programme planning”. By using these 8 typologies as a reference, it is possible to understand how the pattern of conflict has changed between 2015 and 2021.


Changes in conflict patterns between 2015 and 2021, by vulnerabilty band
conflict score
townships above avg conflict score
vulnerability_band 2019_vulnerability 2015_score 2022_score tsp_count %_>avg_2015 %_>avg_2021
  1. Extreme outliers, underdevelopment and conflict
0.549 0.055 0.045 36 38.89 11.11
  1. Conflict-affected, poor human development
0.449 0.053 0.081 25 44.00 16.00
  1. Hubs in conflict-affected areas
0.370 0.033 0.215 21 19.05 47.62
  1. Very low access to services and infrastructure
0.403 0.003 0.112 74 4.05 25.68
  1. Agricultural areas with high profits
0.356 0.003 0.108 64 1.56 25.00
  1. Secondary cities/towns in agricultural areas
0.300 0.008 0.127 65 12.31 38.46
  1. Up-and-coming peri-urban and urban areas
0.228 0.000 0.120 11 0.00 27.27
  1. Affluent, urban core
0.123 0.001 0.156 34 2.94 58.82
Data source: ACLED (accleddata.com) and MIMU; higher scores indicate more vulnerability/conflict


Conflict, once much more concentrated in underdeveloped frontier areas in the operational areas of Ethnic Armed Organisations, is now much more pronounced in bands 3, 6 and 8, which contain major population centres and secondary cities and towns. Hubs in conflict-affected areas have the highest average 2021 conflict scores.

Although conflict has persisted or even increased in many of the frontier and remote areas such as Laukkaing, Mongyai and Tangyan (all from band 1), the relative rankings of these areas have changed and they now form a much smaller share of unionwide conflict than they did in 2015. These townships tend to have much lower conflict scores in spite of their high vulnerability, indicating that whilst they remain development priorities, they should fall outside the caseload for humanitarian action. It is recommended that new vulnerability bands be re-developed, as they have proved very useful; but that is outside the scope of this document.



2.5 Clustering townships

In recognition of the different contexts present in Myanmar (and the consequent need for different programming options), a simple K-means clustering was conducted on the townships to split them into prioritisation groups based on their 2021 conflict score, their 2019 vulnerability score and their population density.

This clustering separates all 330 townships into five groups. The plots below show the spread of townships by prioritisation group across the 2021 conflict score, 2019 vulnerability score and population density.



It can be seen that groups A1 and A2, which have the highest conflict scores, are quite distinct from group D (where the majority of the development caseload resides). Group C has neither high vulnerability nor high conflict incidence. And group B consists of solely urban centres. The table below provides more detail on each of the groups.

Groups A1 and A2 contained 60% of all conflict events and 83% of all conflict fatalities in 2021. Groups A1 and A2 can be distinguished from each other by the intensity of conflict, with A1 being where the concentrations of conflict events and fatalities are the heaviest.

An interactive version of the plot on the left can be found below.


Summary statistics by prioritisation group
group %_conflict_events %_fatalities avg_conflict_score vulnerability_2019 ppl_km2 townships %_total_population
A1 27.22 45.21 0.289 0.358 107 20 6.32
A2 33.22 37.79 0.138 0.349 133 49 17.76
B 11.95 3.39 0.087 0.122 19,326 33 12.12
C 23.90 11.51 0.024 0.340 213 172 51.65
D 3.70 2.09 0.016 0.551 64 56 12.15
Groups A1 and A2 have the highest conflict scores and should be prioritised over the others. Higher scores indicate more vulnerability/conflict.


Groups A1 and A2 both have middling vulnerability scores, but have much higher average conflict scores. Group A1, in particular, has a very high concentration of conflict incidents and fatalities, in addition to having the second-highest vulnerability scores of the groups. These 69 townships (containing about 24% of the population) are clear priorities for humanitarian action.




2.6 Linking geographic prioritisation and beneficiary selection

Using the classifications from the decision tree in section 1.3, we can see that in rural areas, the highest proportions of people who are in the priority group are those who have been affected by conflict, followed by those who were affected by conflict and have lost work or employment opportunities. Rural areas severely-affected by conflict (the townships in groups A1 and part of A2) are the areas in which we may reach populations with the highest proportions of households in the priority group.


Where are the highest proportions of persons in priority groups?
Rural Shocks_Conflict Shocks_Lost_Work %_Priority
no no yes 42.13
yes no yes 42.21
yes yes no 65.95
yes yes yes 61.27
Data sources: ACLED; acleddata.com and FAO-WFP


In urban areas (largely group B, and part of A2), however, conflict incidence is not as important of a determining factor – the best results can be achieved by targeting people who have lost work or employment opportunities, irrespective of whether or not they have been affected by conflict.

The table below shows the characteristics of the population groups that have the lowest concentrations of priority households.


Which areas have less people in the priority group?
Rural Shocks_Conflict Shocks_Lost_Work %_Priority
no no no 18.47
no yes no 22.36
no yes yes 25.34
yes no no 34.51
Data sources: ACLED; acleddata.com and FAO-WFP


As mentioned earlier, rural households were found to be less resilient and more asset-poor in IFPRI’s household welfare survey. These findings align well with the process the Food Security Cluster has developed here to identify priority households (where rural households are significantly more likely to fall into the priority group). It would be extremely fruitful to explore whether this alignment extends to conflict and some of the other environmental and socioeconomic variables that have been employed in this document.

It should also be noted that townships themselves are quite large – on average, each has a population of 161,913 persons. In areas where partners are not yet present, this might necessitate an intermediate step to help partners identify specific areas where they could begin working.

In the ACLED dataset, of the 10,961 events in 2021, 38% of them were recorded with specific village locations.


Conflict events with and without villages
mentions_vilage events %_events fatalities %_fatalities locations
yes 4,142 37.79 5,348 48.57 1,919
no 6,819 62.21 5,664 51.43 300
Data source: ACLED; accleddata.com


Below is an interactive reference table of the 1,917 villages identified in the ACLED dataset, complete with coordinates. While this list does provide an excellent start, by working in these areas, partners should also endeavour to identify the specific locations of the remaining 62% of conflict events.




2.6a Sample beneficiary selection process

Though partners are likely to already have their own beneficiary targeting procedures, based on WFP’s guidelines, an example targeting procedure is listed below, with reference to the information presented in this document.

Sample targeting process

  • Identification and selection of geographical area

    • Selection of township, from the relevant prioritisation group
    • Coordination with the Cluster and Cluster partners to ensure that there are no duplications and that areas served are in line with the severity and magnitude of humanitarian needs
    • Ensure that programming is reflective of the context of the area targeted – rural or urban, conflict-affected or not
  • Set up complaints and feedback mechanism to ensure that issues and concerns raised by community members are addressed efficiently. The complaints and feedback mechanism will be active throughout this entire process.

  • Enumeration and headcount of community members

  • Formation of beneficiary targeting committee which is representative of the community

  • Presentation of draft targeting criteria, for example (selected from variables most predictive of food insecurity from section 1.1 and decision trees):

    • Households which have lost work of employment opportunities
    • Households which have suffered from natural hazards
    • Households which have been affected by conflict, insecurity or violence
    • Households which rely on casual, non-agricultural labour for income
    • Households which have no income source
    • Households whose head have no educational attainment
    • Households with children under 5
    • Households whose members include persons with disabilities
  • Sensitisation and consultation on beneficiary selection criteria and finalisation of selection criteria

  • Drafting of beneficiary list – only households which meet the established criteria should be included

  • Public posting of initial beneficiary list, for public comment, typically for a maximum of 5 days

  • Spot checks and verification on a random selection of selected households

  • Beneficiary registration



2.7 5W results from the first quarter of 2022

In the first quarter of 2022, Food Security Cluster partners have reached a total of 2,233,533 persons or 54.48% of the 2022 target. Below is an examination of the extent to which partners have targeted the townships most affected by conflict.

In this first plot, the number of beneficiaries reached in the first quarter of 2022 are plotted against the targeted population. Each point is a township and the red line down the middle represents reaching 100% of the target. How far above or below a township is indicates how far above or below the target it is. Additionally, the township prioritisation group each township belongs to is marked by the colour.

The townships on the far left of the plot have beneficiaries despite not having targets for 2022 (their targets have been nominally coded as \(1\) so they appear on the plot).



With reference to the table below, 2.8% of beneficiaries came from group A1 and 10.7% of beneficiaries came from group A2. On the surface, this seems like partners have made effort to reach conflict-affected townships. However, this reach has largely been due to oversubscription in Sittwe, where the number of beneficiaries reached in 152% of the target.

The development of the prioritisation groups also brings up the broader point of whether or not cluster targets are in line with needs and if they should be reformulated based on the information now available, as the targets in groups B and C are noticeably higher than those in group A. Nevertheless, it is hoped that partners will be able to afford townships in groups A1 and A2 greater coverage as the year progresses.


2022 Q1 beneficiaries and percent reached by prioritisation group
group beneficiaries %_ben target gap %_gap tsp_reached tsp_total
A1 64,544 2.89 178,541 134,400 75.28 7 20
A2 239,406 10.72 465,799 322,637 69.27 17 49
B 1,453,511 65.08 1,432,480 983,963 68.69 3 33
C 249,727 11.18 1,588,332 1,480,916 93.24 24 172
D 226,345 10.13 434,862 356,379 81.95 20 56
Any reach above 100% is counted as 100%; exceeding the target in one township does not affect other townships


Overall, these are not encouraging results. Notably, only 3,450 beneficiaries being reached in the whole of Sagaing, where the fighting has been heaviest. It is recommended that targets and plans for the Food Security Cluster be reviewed, and partners be reminded to reallocate resources away from oversubscribed areas and away from group C, which are neither humanitarian nor development priorities.



2.8 Maps of conflict scores and prioritisation group



2.9 Reference table for conflict variables

Below is an interactive reference table for the various types of conflict events by township. It also includes the overall conflict score and prioritisation groups. The search bar can be used to find specific townships, or any of the columns may be sorted according to ascending or descending values. The table currently shows townships in descending order of conflict score.





3. Distribution of flood risk in Myanmar

3.1 Historical flood data

In light of the impending monsoon season, the probability that a township will be affected by a major flood or cylconic event has been calculated. Major floods since 2008 have been factored into this calculation.

For the moment, conflict incidence and flood and cyclone risk will be evaluated separately. Flood and storm surge risk exist as probabilities for the moment and are intended to support prepositioning and Disaster Risk Reduction. This might change were severe flooding to occur in 2022.



Based on this data, a score was calculated for each township based on how many times it had been affected by floods since 2008. The table below also summarises the number of people in need (2022). 2,210,725 people live in townships that have flooded more than 5 times since 2008.

Summary statistics by number of floods (2008-2021)
flood_count townships people_in_need
9 1 55,490
8 4 333,874
7 7 292,124
6 15 565,632
5 23 963,605
4 33 1,769,108
3 54 2,735,536
2 70 2,994,977
1 73 2,213,089
0 50 1,299,033
Data source: MIMU and UNDP



3.2 Map of flood risk

The map below shows the probability of each township being affected by floods. The areas with the greatest risk of flooding are in Mon, near the mouth of the Sittaung River and the Gulf of Mottama and those along the Ayeyarwady River, and to a lesser extent, along the Chindwin River.




3.3 Reference table for flood risk

Below is an interactive reference table for flood risk by township. It includes the number of times since 2008 a township has been affected by flooding (flood_count) and the probability of flooding (flood_risk). Similar to the interactive table in the previous chapter, the search bar can be used to find specific townships and any of the columns may be sorted according to ascending or descending values. The table currently shows townships sorted in descending order of flood risk.





4. Technical notes

These annexes contain additional technical information that informed the decisions in the earlier sections.

4.1 Limitations and next steps


FAO-WFP food security survey

The most important limitation of the FAO-WFP survey was the exclusion of several key states and regions from the survey. Of particular interest are Sagaing, Magway and Mandalay where the conflict has been particularly intense.

Furthermore, the dry zone was not surveyed. From an agricultural perspective, this is a major omission as the diversity of crops and, consequently, diets are much higher in the dry zone than in the other parts of the country, which are predominantly focused on paddy.

Additionally, the targeting process proposed in this document has not yet been trialled in the field. The Food Security Cluster does not have the resources to undertake a field test of this scale. However, every attempt has been made to corroborate the data presented in it.

In spite of these major limitations and the numerous assumptions that have had to made, the FAO-WFP survey is most comprehensive dataset on food security that has been collected so far. Additional efforts will be made to cross-reference these data from those of other surveys. These models will be updated once the third round of the FAO-WFP survey is ready. As a final point in this section, the FAO-WFP survey, in spite of its limitations forms the basis of the People in Need calculations, which underpins a lot of the response.

This paper serves a proposal on how vulnerable households (such as those in the priority group) may be identified and targeted. Should this methodology prove sound and viable, it is suggested that it be applied to either IFPRI’s Household Welfare Survey as well as data collected in the third round of the FAO-WFP food security survey.



MIMU-HARP vulnerability analysis

The results of the 2015 MIMU-HARP Vulnerability Analysis (used to inform scores for multidimensional vulnerability), have been updated using the 2019 Inter-Censal Survey results using the following formula:

\[ 2019TspValue = 2015TspValue / 2015DistrictValue \cdot 2019DistrictValue \]

This allows the new township values to tally with the 2019 district-level inter-censal survey results as well as to preserve the order and relationships of townships within each district. For the two indicators in the 2015 dataset but not covered in the 2019 inter-censal survey, they were forward filled, using their 2015 values. To further improve these estimations, multiple imputation should be employed. But that will be left for any subsequent revisions to this document.



Conflict data and ACLED

Perhaps the most key limitation has also been the lack of field access and detailed assessment data from many parts of the country. With the conflict ongoing, and the footprints of humanitarian agencies largely skewed towards Yangon and Rakhine, which have been comparatively less affected by the current crisis, there is a demonstrable dearth of first-hand information in key areas. This document has intended to circumvent this through the use of ACLED data, which is the most complete set of conflict incident data in Myanmar.

However, the ACLED dataset is not without its limitations – the majority of its information, about 85 percent, comes from subnational, national and international media sources. The remainder comes from ACLED’s partner, the Myanmar Peace Monitor, and reports from UN agencies, international monitoring groups, and local human rights organisations. The completeness of the conflict data and how representative it is of the situation on the ground is not something that is easily verifiable. Though it should be noted that ACLED still has the largest and most comprehensive dataset of conflict incidents in Myanmar.



Targeting limitations

The targeting processes proposed in this document have not yet been field-tested. Whilst the Food Security Cluster is confident of its analysis, it does not have the resources to undertake a field trial to understand how what has been proposed here will impact operations. Access to affected areas has also not been considered.

Finally, the prioritisation process is not entirely seamless – whilst it is possible to prioritise from the union-level down to the township level and from the village down to the household with few issues, there remains a gap between the township and the village. In large part, this is due to the dearth of data available at the village-tract and village levels. The Food Security Cluster has partially mitigated this through the provision of village names listed in the ACLED dataset, but this is also a partial list and not fully matched at the village-tract levels either. Unless there is a full enumeration of village tracts and villages in Myanmar, this issue is likely to persist.



4.2 Food insecurity score

Food insecurity here is not recognised solely as a food secure/food insecure dichotomy and is instead treated as a matter of degree determined by an household’s position the overall distribution of food security indicators. According to Betti and Verma, “by appropriately weighting non-monetary indicators of deprivation to reflect their dispersion and correlation, it is possible to construct quantitative indices of deprivation in its various dimensions”. At its most basic, the food insecurity score is a weighted sum of its component indicators that rewards variables that are responsible for more of the variation in the dataset.

To calculate the Food Insecurity Score, variables within the Livelihood Coping Strategies Index, Food Consumption Score and Food Insecurity Experience scale were rescaled to values between \(0\) and \(1\). For most of the variables, \(1\) meant “yes” and \(0\) meant “no”; however, for the Food Consumption Score indicators, \(0\) meant that a household had consumed that food group every day for the past 7 days and \(1\) meant that a household had not consumed that food group for any of the prior 7 days. Additionally, coping strategies indicators at the “crisis” level were coded as \(0.5\) for “yes” and indicators at the “emergency” level were coded as \(1\) for “yes”. This is to replace the typical severity weighting that would normally have occurred but was not part of the FAO-WFP survey process.

These calculations have been encoded in the R package mdepriv by Attilio and Aldo Benini, building on work done by Alperin and Van Kerm on their mdepriv Stata package. According to Alperin and Van Kerm, “mdepriv creates synthetic scores of deprivation that are linear functions of uni-dimensional deprivation items measured on the [0, 1] scale. With \(x_{ij}\) denoting the value of a particular deprivation items \(j\) for observation \(i\). The synthetic scores calculated by mdepriv are weighted sums of the \(M\) deprivation items:

\[ s_i = \sum_{j = 1}^{M}{w_j}x_{ij} \]

A food insecurity score was then calculated using the weighted sum of the individual variables. The weighting scheme chosen was the default for the mdepriv package, Cerioli and Zani (1990) weighting. Under this weighting scheme, variable weights are “frequency-based” and are a function of their sample mean but just slightly modified to give a stronger weight to relatively rare items so that variation is rewarded:

\[ w_j \propto log(\frac{1}{\bar{x}_j}) \]

The specific code for generating the food insecurity score was:

food_insecurity_score <- survey %>% 
  mutate_at(vars(cs_crisis_sold_prod_assets, cs_crisis_no_school, 
                 cs_crisis_reduced_health_exp), ~ . * 0.5) %>% 
  select(survey_id, hhfcs_inv, # inverse of food consumption score
         cs_crisis_sold_prod_assets, cs_crisis_no_school, 
         cs_crisis_reduced_health_exp, cs_emergency_sold_house, 
         cs_emergency_hh_migration, cs_emergency_hh_risk,
         fies_fewfood, fies_skipped, fies_whlday, fies_hungry, 
         fies_ateless, fies_healthy, fies_runout, fies_worried, combined_wt) %>% 
  mdepriv(c("hhfcs_inv", 
         "cs_crisis_sold_prod_assets", "cs_crisis_no_school", 
         "cs_crisis_reduced_health_exp", "cs_emergency_sold_house", 
         "cs_emergency_hh_migration", "cs_emergency_hh_risk",
         "fies_fewfood", "fies_skipped", "fies_whlday", "fies_hungry", 
         "fies_ateless", "fies_healthy", "fies_runout", "fies_worried"), 
         output = "all", sampling_weights = "combined_wt")



4.3 Limited performance of linear regression models

Prior to the development of the decision trees detailed in section 1.3, a linear regression model had been considered. However, no matter how sophisticated the method, the highest r-squared achieved was 0.136 (a model with an r-squared with 1 would mean it was perfectly predictive and an r-squared of 0 means that a model does not predict any of the variance at all; in this case, the best linear regression model still left 86% of the variance unexplained).

With reference to the plot below of predicted values of the food security score vs the actual values (of the best performing model), whilst there is a significant relationship, its predictive performance is too weak. This was why decision trees were employed instead of a linear model. Each point in the plot below is a household in the survey.



The relatively low r-squared is likely due to missing variables in the FAO-WFP survey. For instance, the factors that go into a household’s decision to reduce their healthcare expenditures are too multifarious to be captured by a survey on food security.

This pattern was also true for the overall food security score as well as the individual food security indices (Food Consumption Score, Food Insecurity Experience Scale and Coping Strategies Index).

For additional reference, the table below shows the top 15 variables ranked by the percentage of variance of the food security score that they explain.

Variables ranked by percent of variance
term pc_variance
Residuals 85.66
shocks_none 5.89
income_ms_stable_non_ag 1.74
shocks_lostwork 1.16
rural 0.91
shocks_conflict 0.68
not_improved_sanitation 0.59
income_ms_selfem_nonwork 0.54
age_18_40 0.46
shocks_naturalhazard 0.43
income_ms_agriculture 0.39
income_ms_livestock 0.31
shocks_pricesother 0.29
age_41_65 0.25
shocks_sicknessdeath 0.23


As a point of reference, below is the code for the decision tree.

tree_environment_circumstances <- survey %>% 
  rpart(priority ~  shocks_lostwork + shocks_sicknessdeath + shocks_foodprices + 
          shocks_conflict + shocks_cantworkbusiness + shocks_naturalhazard + 
          shocks_accessmarket + shocks_pricesother + shocks_other + 
          rural + not_improved_sanitation,
        data = .,  weights = combined_wt , cp = 0.01)

tree_demographics <- survey %>% 
  mutate(children_0_4 = ifelse(children_0_4 > 0, 1, 0)) %>% 
  rpart(priority ~ children_0_4 + hoh_female + hoh_education + hoh_age, 
        data = ., weights = combined_wt, cp = .0088)



4.4 Pairwise correlations of indicators in the FAO-WFP survey

The table below shows the top 20 correlations between the different variables collected by the FAO-WFP food security survey. When a particular variable is of interest, it may be useful and interesting to see which other variables it is most correlated with.

The vast majority of the variables, with the exception of composite scores and indices, are binary, with 1 being true and 0 being false. In these cases, a correlation of 1 would mean that a variable is true for all the households in which its match is also true; a correlation of 0.5 means that a variable is true for 50% of the households in which its match is true.

For instance, whether or not the head of the household has competed education (edu_higher) is most correlated with households whose main income source is stable non-agricultural income (income_ms_stable_non_ag). Similarly, a correlation of 0.175 between rural and edu_none indicates that in 17.5% of all rural households, the head of household has no formal educational attainment. It also should be noted that, as expected, the components of each of the food security indices are most correlated with each other.




4.5 Calculating the conflict score

The conflict score here originally appeared in the Food Security Cluster’s report Understanding Conflict Dynamics in Myanmar through Conflict and Incident Data: A Food Security Perspective. It was calculated using ACLED data and yields a score for each township.

The conflict score is an update of the conflict index in the MIMU-HARP Vulnerability Analysis, using 2021 data. The specific conflict variables that included in the score were battles, explosions and remote violence, non-peaceful protests and riots, conflict-related fatalities, strategic developments and number of IDPs. Only the version of the conflict score within this document takes IDPs into account; in the Food Security Cluster’s earlier report on conflict (linked in the previous paragraph), the Cluster did not yet have access to detailed IDP data.

The conflict score is an average of the normalised values of key conflict indicators. Its main use it to aid decisions about geographic prioritisation. These normalised values have been re-weighted with Betti-Verma method, which penalises redundancy and rewards variation; this is the only notable divergence from the MIMU-HARP methodology. The Betti-Verma method was employed through the mdepriv package developed by Attilio and Aldo Benini.

Unlike the Food Insecurity Score, the component variables of the conflict index were not binary, meaning that it was possible to take advantage of the Betti-Verma method’s double-weighting rule which is sensitive to both the relative frequency of the variables and the correlation amongst variables:

\[ w_j \propto (w_j^a \cdot w_j^b) \]

However, it must be noted that whilst the scores themselves can be shared and used, replicating all the calculations will necessitate obtaining permission from UNHCR as their township-level breakdown of IDP populations have not been shared publicly. The specific code for calculating the conflict score can be found below.


# Betti-Verma calculations and the construction of the conflict score
index_shares2 <- conflict_df2 %>%   
  mutate_at(vars(c(battles, explosions_remote_violence, violence_against_civilians, 
                   fatalities, strategic_developments, 
                   protests_and_riots, total_idps)), 
           scale) %>%  
  mutate_at(vars(c(battles, explosions_remote_violence, violence_against_civilians, 
                   fatalities, strategic_developments, 
                   protests_and_riots, total_idps)), 
           funs((. - min(., na.rm = T))/(max(., na.rm = T) - min(., na.rm = T)))) %>% 
  mdepriv(c("battles", "explosions_remote_violence", 
            "violence_against_civilians", "fatalities", 
            "strategic_developments", "protests_and_riots", "total_idps"),
          # IDP counts have been used in the score, but will not be read 
          # into this report
          method = "bv", output = "all")
---
title: "Township Prioritisation and Establishing a Beneficiary Profile"
author: "Myanmar Food Security Cluster"
date: "25/03/2022"
output: 
  html_document:
    code_download: true
    theme: readable
    toc: true
    toc_depth: 4
    toc_float: true
    number_sections: false
    collapsed: false
always_allow_html: true   
---

```{css, echo=FALSE}

#TOC::before {
  content: "";
  display: block;
  height: 70px;
  margin: 2em 20px 40px 20px;
  background-image: url("Myanmar_cluster_blue.png");
  background-size: contain;
  background-position: center center;
  background-repeat: no-repeat;
}
```

```{=html}
<style>
    body .main-container {
        max-width: 1280px;
    }
</style>
```


```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE, fig.width=9, message = FALSE, warning=FALSE)
library(tidyverse)
library(readxl)
library(haven)
library(lubridate)
library(stringi)
library(pander)
library(janitor)
library(scales)
library(magrittr)
library(sf)
library(bookdown)
library(patchwork)
library(kableExtra)
library(DT)
library(viridis)
library(mdepriv)
library(psych)
library(widyr)
library(rpart)
library(rpart.utils)
library(rattle)
library(broomstick)
library(corrplot)
library(broom)
library(tidytext)
library(plotly)
library(ggridges)
library(RColorBrewer)
library(ggforce)

theme_set(theme_minimal())

# disabling scientific notation
options(scipen = 999)

# pander tables all in one row
panderOptions('table.split.table', Inf)

# pander thousands separator
panderOptions("big.mark", ",")

# replace 
opts <- options(knitr.kable.NA = "")

`%out%` <- Negate(`%in%`)

# function beneficiary summaries
sum_ben <- function(df, column_var){
  
  column_var <- enquo(column_var)
  
  df %>%
    group_by(!!column_var) %>% # must add bang-bang
    summarise(beneficiaries = sum(beneficiaries)) %>% 
    arrange(desc(beneficiaries))
    
}

# function beneficiary summaries, 2 grouped variables
sum_ben2 <- function(df, column_var1, column_var2){
  
  column_var1 <- enquo(column_var1)
  column_var2 <- enquo(column_var2)
  
  df %>%
    group_by(!!column_var1, !!column_var2) %>% # must add bang-bang
    summarise(beneficiaries = sum(beneficiaries)) %>% 
    arrange(desc(beneficiaries))
    
}

# scaling functions 
range01 <- function(x){(x-min(x))/(max(x)-min(x))}
range_wna <- function(x){(x-min(x, na.rm = TRUE))/(max(x, na.rm = TRUE)-min(x, na.rm = TRUE))}

#mode function 
mode <- function(x) {
  ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]
}

# show_col(viridis_pal(option = "cividis")(10))
```

```{r df-pcodes-shapefiles}
# pcodes
pcodes <- read_excel("Myanmar PCodes Release_9.3_Jan2021_(StRgn_Dist_Tsp_Town_Ward_VT).xlsx",
                     sheet = "03_Township") %>%
  rename(admin1_pcode = SR_Pcode,
         state = SR_Name_Eng,
         admin3_pcode = Tsp_Pcode,
         township = Township_Name_Eng,
         admin2_pcode = `District/SAZ_Pcode`, 
         district = `District/SAZ_Name_Eng`)

# shapefiles
pcode3_shape <- st_read("./mmr_polbnda_adm3_mimu_250k/mmr_polbnda_adm3_mimu_250k.shp", 
                        quiet = TRUE) %>% 
 rename(state = ST, 
        admin1_pcode = ST_PCODE,
        township = TS,
        admin3_pcode = TS_PCODE) %>% 
 mutate(admin3_pcode = ifelse(str_detect(township, "Hlaingtharya"), "MMR013008", admin3_pcode))

pcode1_shape <- st_read("./mmr_polbnda2_adm1_mimu_250k/mmr_polbnda2_adm1_mimu_250k.shp", quiet = TRUE) %>% 
 rename(state = ST, 
        admin1_pcode = ST_PCODE) %>% st_as_sf()

```

```{r df-fsc-vulmmr-acled}
# reading in 5Ws 2021
fsc_2021 <- read_csv("fsc5w_2021.csv")

# 2022 Q1 5Ws
fsc <- read_csv("fsc_q1_2022.csv")

# reading in vulnerability dataset
vulmmr <- read_excel("Datasets_Vulnerability_Analysis_in_Myanmar_09Jul2018 (1).xlsx",
           skip = 1) %>% 
  slice(-c(1:3)) %>% 
  clean_names() %>% 
  select(-label) %>% 
  mutate_at(vars(number_of_village_tracts:wb_wealth_rank), as.numeric) %>% 
  mutate_at(vars(disasters_impacted_by_nargis_2008:acled_2015_2016_data_exists), as.logical) %>% 
  mutate_at(vars(conflict_2015_2016_number_of_battles:corrected_conflict_index_garry), as.numeric) %>% 
  select(-starts_with("x")) %>% 
  select(-c(private_sector_development_2014_2015, protection_2010_2015, shelter_2010_2015, wash_2010_2015))


# ACLED dataset
acled <- read_csv("acled_new.csv") %>% 
  mutate(has_village = if_else(location != admin3, "yes", "no"))
```

```{r df-fs-pin-and-dependencies}
# reading in conflict score
conflict_score <- read_csv("conflict_score2.csv")

# show_col(viridis_pal()(10))

floods_storm_surge <- read_excel("Flood_Affected_Township_(2008-2021) (1).xlsx") %>% 
  clean_names() %>% 
  rename(state = st, township = ts, admin1_pcode = st_pcode, admin3_pcode = ts_pcode) %>% 
  mutate(year_2008_storm_surge = ifelse(!is.na(storm_surg), 1, 0)) %>%
  mutate(year_2008 = ifelse(str_detect(flood_year, "2008"), 1, 0),
         year_2009 = ifelse(str_detect(flood_year, "2009"), 1, 0),
         year_2010 = ifelse(str_detect(flood_year, "2010"), 1, 0),
         year_2011 = ifelse(str_detect(flood_year, "2011"), 1, 0),
         year_2012 = ifelse(str_detect(flood_year, "2012"), 1, 0),
         year_2013 = ifelse(str_detect(flood_year, "2013"), 1, 0),
         year_2014 = ifelse(str_detect(flood_year, "2014"), 1, 0),
         year_2015 = ifelse(str_detect(flood_year, "2015"), 1, 0),
         year_2016 = ifelse(str_detect(flood_year, "2016"), 1, 0),
         year_2017 = ifelse(str_detect(flood_year, "2017"), 1, 0),
         year_2018 = ifelse(str_detect(flood_year, "2018"), 1, 0),
         year_2019 = ifelse(str_detect(flood_year, "2019"), 1, 0),
         year_2020 = ifelse(str_detect(flood_year, "2020"), 1, 0),
         year_2021 = ifelse(str_detect(flood_year, "2021"), 1, 0)) %>%
  rename(floodyears = flood_year) %>% 
  mutate(
    flood_prob = select(., starts_with("year_")) %>% rowMeans(na.rm = TRUE),
    flood_count = select(., starts_with("year_")) %>% rowSums(na.rm = TRUE))

# We can try again later, but I don't think we need the vul_score
# Since we are going in a different direction 
vul_share <- vulmmr %>% select(avg_safe_sanitation_improved_drinking_water:female_literacy_percent_literate, 
                  child_dependency_ratio_31, unpaid_family_worker_percent,
                  admin3_pcode = township_pcode) %>% 
  left_join(conflict_score %>% select(conflict_score, admin3_pcode), by = "admin3_pcode") %>% 
  select(-conflict_index_c_percent_of_max_inverse_percent_of_average_of_envelopes_12, 
         -unpaid_family_workers_inverse, -child_dependency_ratio_inverse) %>% 
  mutate_at(vars(avg_safe_sanitation_improved_drinking_water:female_literacy_percent_literate), ~ 1 - .x) %>%
  mutate(conflict_env = range_wna(conflict_score)) %>%
  filter(!is.na(admin3_pcode)) %>%  
  replace(is.na(.), 0) %>% 
  mdepriv(c("avg_safe_sanitation_improved_drinking_water", "average_good_roof_and_wall", 
            "highest_education_at_least_middle_school_percent_14", "electricity_for_lighting_percent_15", 
            "id_card_with_id_total_percent_16", "female_literacy_percent_literate", 
            "child_dependency_ratio_31", "unpaid_family_worker_percent", "conflict_env"), 
          method = "cz", output = "all", 
          score_i_heading = "vul_score")

# main township-level dataset
fs_pin <- read_excel("hpc_pin.xlsx") %>%  
  left_join(fsc %>% group_by(admin3_pcode_old) %>% 
              summarise(beneficiaries = sum(new_beneficiaries)),
            by = c("admin3_pcode" = "admin3_pcode_old")) %>% 
  left_join(conflict_score %>% select(admin3_pcode, battles, explosions_remote_violence, protests_and_riots, 
                                      strategic_developments, violence_against_civilians, 
                                      conflict_score, fatalities),
            by = "admin3_pcode") %>% 
  left_join(vulmmr %>% select(admin3_pcode = township_pcode, 
                              admin2_pcode = district_pcode, total_population_2015 = total_pop_both_sexes,  
                     area_sown_acres = all_area_sowed_mali, population_density, vulnerability_band, 
                     old_conflict_score = conflict_index_a_average_of_percent_of_totals,
                     old_conflict_env = conflict_index_b_average_of_envelopes,
                     approximate_vulnerable_population ),
            by = "admin3_pcode") %>% 
  left_join(floods_storm_surge %>% 
              select(admin3_pcode, flood_prob, flood_count),
            by = "admin3_pcode") %>% 
  left_join(fsc_2021 %>%
              filter(unique_beneficiaries == "Yes") %>%
              group_by(admin3_pcode) %>%
              summarise(beneficiaries_2021 = sum(beneficiaries),
                        partners_2021 = n_distinct(org_code)),
            by = "admin3_pcode") %>% 
  mutate(fs_pin = ifelse(fs_pin > population_2021_proj, population_2021_proj, fs_pin)) %>% 
  replace_na(list(conflict_score = 0, beneficiaries = 0, beneficiaries_2021 = 0, partners_2021 = 0, flood_prob = 0)) %>% 
  mutate(conflict_ranking = dense_rank(desc(conflict_score)),
         flood_ranking = dense_rank(desc(flood_prob)),
         fs_pin_ranking = dense_rank(desc(fs_pin)),
         fs_target_ranking = dense_rank(desc(fs_targeted)),
         fs_pin_pc = fs_pin / sum(fs_pin),
         pin_pop = fs_pin / population_2021_proj) %>% 
  mutate(old_group = case_when(conflict_score >= mean(conflict_score) & population_density >= 1000 ~ "A1",
                           conflict_score >= mean(conflict_score) & population_density < 1000 ~ "A2",
                           conflict_score < mean(conflict_score) & population_density >= 1000 ~ "B1",
                           conflict_score < mean(conflict_score) & population_density < 1000 ~ "B2")) %>% 
  mutate(band_text = case_when(vulnerability_band == 1 ~ "1. Extreme outliers, underdevelopment and conflict", 
                               vulnerability_band == 2 ~ "2. Conflict-affected, poor human development", 
                               vulnerability_band == 3 ~ "3. Hubs in conflict-affected areas", 
                               vulnerability_band == 4 ~ "4. Very low access to services and infrastructure", 
                               vulnerability_band == 5 ~ "5. Agricultural areas with high profits", 
                               vulnerability_band == 6 ~ "6. Secondary cities/towns in agricultural areas", 
                               vulnerability_band == 7 ~ "7. Up-and-coming peri-urban and urban areas", 
                               vulnerability_band == 8 ~ "8. Affluent, urban core")) %>% 
  left_join(vul_share$data %>%  select(-conflict_score), by = "admin3_pcode") %>%
  mutate(mdp_score_old = (avg_safe_sanitation_improved_drinking_water + average_good_roof_and_wall + 
           highest_education_at_least_middle_school_percent_14 + electricity_for_lighting_percent_15 + 
           id_card_with_id_total_percent_16 + female_literacy_percent_literate + 
           child_dependency_ratio_31 + unpaid_family_worker_percent) / 8) %>% 
  ungroup()

# 2019 intercensal district-level values
district_ref <- fs_pin %>%  
  select(state, admin2_pcode, admin3_pcode, total_population_2015,
         avg_safe_sanitation_improved_drinking_water:unpaid_family_worker_percent) %>% 
  group_by(admin2_pcode) %>% 
  mutate_at(vars(avg_safe_sanitation_improved_drinking_water:unpaid_family_worker_percent), 
            ~ .x * total_population_2015) %>% 
  relocate(total_population_2015, .after = unpaid_family_worker_percent) %>% 
  summarise(across(avg_safe_sanitation_improved_drinking_water:total_population_2015, ~sum(.))) %>% 
  mutate_at(vars(avg_safe_sanitation_improved_drinking_water:unpaid_family_worker_percent), 
            ~ .x / total_population_2015) %>% 
  left_join(
    read_excel(
      "Dataset_Climate, Environmental Degradation and Disaster Risk in Myanmar_May2022.xlsx",
      skip = 2,
      sheet = "2016-21 Vulnerability"
    ) %>%
      slice(-1) %>%
      clean_names() %>%
      select(-label) %>%  
  rename(state = state_region_name, admin1_pcode = state_region_pcode, 
         district = district_name, admin2_pcode = district_pcode) %>%
      select(admin2_pcode, avg_safe_sanitation_improved_drinking_water_21:female_literacy_percent_literate_29) %>% 
      select(-conflict_index_c_percent_of_max_inverse_percent_of_average_of_envelopes_23) %>% 
      na.omit() %>% 
      mutate_at(vars(avg_safe_sanitation_improved_drinking_water_21:female_literacy_percent_literate_29), ~ as.numeric(.x)) %>% 
      mutate_at(vars(avg_safe_sanitation_improved_drinking_water_21:female_literacy_percent_literate_29), ~ 1 - .x), 
    by = "admin2_pcode") %>% 
  select(admin2_pcode, 
         d_water_2015 = avg_safe_sanitation_improved_drinking_water, 
         d_house_2015 = average_good_roof_and_wall, 
         d_electricity_2015 = electricity_for_lighting_percent_15, 
         d_literacy_2015 = female_literacy_percent_literate, 
         d_child_2015 = child_dependency_ratio_31, 
         d_unpaid_2015 = unpaid_family_worker_percent, 
         d_water_2019 = avg_safe_sanitation_improved_drinking_water_21, 
         d_house_2019 = average_good_roof_and_wall_22  , 
         d_electricity_2019 = electricity_for_lighting_percent_26, 
         d_literacy_2019 = female_literacy_percent_literate_29, 
         d_child_2019 = child_dependency_ratio_inverse_28, 
         d_unpaid_2019 = unpaid_family_workers_inverse_24) 

# write it back into fs_pin 
fs_pin <- fs_pin %>% 
  left_join(district_ref, by = "admin2_pcode") %>% 
  mutate(water_2019 = ifelse(!is.na(d_water_2019),
                                 avg_safe_sanitation_improved_drinking_water / d_water_2015 * d_water_2019, 
                                 avg_safe_sanitation_improved_drinking_water), 
         house_2019 = ifelse(!is.na(d_house_2019), 
                                 average_good_roof_and_wall / d_house_2015 * d_water_2019, 
                                 average_good_roof_and_wall), 
         edu_2019 = highest_education_at_least_middle_school_percent_14, 
         electricity_2019 = ifelse(!is.na(d_electricity_2019),
                                       electricity_for_lighting_percent_15 / d_electricity_2015 * d_electricity_2019, 
                                       electricity_for_lighting_percent_15), 
         id_card_2019 = id_card_with_id_total_percent_16, 
         female_literacy_2019 = ifelse(!is.na(d_literacy_2019), 
                                       female_literacy_percent_literate / d_literacy_2015 * d_literacy_2019,
                                       female_literacy_percent_literate), 
         child_dependency_2019 = ifelse(!is.na(d_child_2019), 
                                        child_dependency_ratio_31 / d_child_2015 * d_child_2019, 
                                        child_dependency_ratio_31), 
         unpaid_2019 = ifelse(!is.na(d_unpaid_2019),
                              unpaid_family_worker_percent / d_unpaid_2015 * d_unpaid_2019, 
                              unpaid_family_worker_percent)) %>% 
  mutate(mdp_adjust = (water_2019 + house_2019 + edu_2019 + electricity_2019 + 
                       id_card_2019 + female_literacy_2019 + child_dependency_2019 + unpaid_2019) / 8) %>% 
  ungroup()

# k-means clustering and output
set.seed(123)

km_res2 <- fs_pin %>% 
  replace_na(list(fatalities = 0)) %>% 
  select(population_density, conflict_score, mdp_adjust, fatalities) %>%
  mutate_at(vars(population_density, conflict_score, mdp_adjust, fatalities), ~ range_wna(.x)) %>% 
  kmeans(5, nstart = 25)

# reading it back into fs_pin
fs_pin <- fs_pin %>% 
  cbind(cluster = km_res2$cluster) %>% 
  mutate(cluster = recode(cluster, 
                          `1` = "A1",`4` = "A2", 
                          `2` = "B", `3` = "C", `5` = "D"))

```


```{r df-fao-wfp-survey}

# should i just read in the whole survey dataset here? or just read in the csv
# i don't really know if people are interested in the cleaning process

fao_survey <- read_dta("MMR_R2HQ_cleanedWt_final.dta") %>% 
  zap_labels()

survey <- fao_survey %>% 
  rename(admin3_pcode = adm3_pcode) %>% 
  left_join(pcodes %>% select(admin3_pcode, township, state, admin1_pcode, district, admin2_pcode),
            by = "admin3_pcode") %>% 
  mutate_at(vars(cs_stress_hh_assets:cs_emergency_hh_risk), 
            ~case_when(. == 4 ~ NA_real_,
                       TRUE ~ .)) %>% 
  mutate_at(vars(cs_stress_hh_assets:cs_emergency_hh_risk), 
            ~case_when(. == 1 ~ 1,
                       . == 2 ~ 0, 
                       . == 1 ~ 1, 
                       TRUE ~ 0)) %>%
  mutate(lhcsi_max = case_when(lhcsi_max_ag == "None" ~ 0,
                                        lhcsi_max_ag == "Stress" ~ 1,
                                        lhcsi_max_ag == "Crisis" ~ 2,
                                        lhcsi_max_ag == "Emergency" ~ 3),
                  lhcsi_range = range01(lhcsi_max)) %>%
  mutate(csi_weighted = (cs_crisis_sold_prod_assets + cs_crisis_no_school + cs_crisis_reduced_health_exp) * 0.5 + 
           (cs_emergency_sold_house + cs_emergency_hh_migration + cs_emergency_hh_risk) * 1,
         csi_weighted = range_wna(csi_weighted)) %>% 
  mutate_at(vars(hh_gender, fewfood, skipped, whlday, hungry, ateless, healthy, runout, worried, 
                 crp_salesdif, accessmarket, need, crp_proddif, cropdifficultyunrest, cropsalesdifficultyunrest), 
            ~case_when(. == 3 ~ NA_real_, 
                       . == 4 ~ NA_real_, 
                       TRUE ~ .)) %>% 
  mutate(hoh_female = case_when(hh_gender == 1 ~ 0, 
                                hh_gender == 2 ~ 1,
                                TRUE ~ hh_gender)) %>%
  mutate_at(vars(fies_fewfood = fewfood, fies_skipped = skipped, 
                 fies_whlday = whlday, fies_hungry = hungry,
                 fies_ateless = ateless, fies_healthy = healthy,
                 fies_runout = runout, fies_worried = worried,
                 accessmarket, need, crp_salesdif, crp_proddif,
                 cropdifficultyunrest, cropsalesdifficultyunrest),
            ~case_when(. == 1 ~ 1, 
                       . == 2 ~ 0, 
                       TRUE ~ 0)) %>%
  mutate(no_accessmarket = ifelse(accessmarket == 1, 0, 1)) %>% 
  # be careful here -- you've imputed missing fies data as 0
  # this is the fies raw score, not sure if i need any additional transformations at the hhd level
  mutate(fies_raw = fies_fewfood + fies_skipped + fies_whlday + fies_hungry +
           fies_ateless + fies_healthy + fies_runout + fies_worried, 
         fies_raw_range = range_wna(fies_raw)) %>% 
  mutate_at(vars(hh_wealth_toilet, hh_residencetype, hh_education,
                 income_main_comp, income_sec_comp, income_third_comp, 
                 crp_area_change, crp_harv_change, crp_salesprice),
            ~case_when(. == 6 ~ NA_real_,
                       . == 7 ~ NA_real_, 
                       TRUE ~ .)) %>% 
  mutate_at(vars(income_main_comp, income_sec_comp, income_third_comp, 
                 crp_area_change, crp_harv_change, crp_salesprice),
            ~range_wna(.)) %>%   
  mutate(not_improved_sanitation = case_when(hh_wealth_toilet == 1 ~ 0,
                                                       hh_wealth_toilet == 2 ~ 0,
                                                       hh_wealth_toilet == 3 ~ 1,
                                                       hh_wealth_toilet == 4 ~ 1,
                                                       hh_wealth_toilet == 5 ~ 1)) %>% 
  mutate(residence_type = case_when(hh_residencetype == 1 ~ "residence_permanent",
                                    hh_residencetype == 2 ~ "residence_recent_migrant",
                                    hh_residencetype == 3 ~ "residence_returnee",
                                    hh_residencetype == 4 ~ "residence_idp",
                                    hh_residencetype == 5 ~ "residence_refugee",
                                  TRUE ~ NA_character_)) %>% 
  mutate(hoh_education = case_when(hh_education == 1 ~ "edu_none",
                                   hh_education == 2 ~ "edu_primary",
                                   hh_education == 3 ~ "edu_secondary",
                                   hh_education == 4 ~ "edu_higher",
                                   hh_education == 5 ~ "edu_religious",
                                   TRUE ~ NA_character_)) %>%
  mutate_at(vars(hh_age, incomechangereduced, wheremigration, meals, hh_agricactivity),
            ~case_when(. == 5 ~ NA_real_,
                       . == 6 ~ NA_real_,
                       TRUE ~ .)) %>% 
  mutate(hoh_age = case_when(hh_age == 1 ~ "age_under18",
                             hh_age == 2 ~ "age_18_40",
                             hh_age == 3 ~ "age_41_65",
                             hh_age == 4 ~ "age_over65",
                             TRUE ~ NA_character_)) %>% 
    mutate(meals_inv = case_when(meals == 4 ~ 1, 
                               meals == 3 ~ 2,
                               meals == 2 ~ 3, 
                               meals == 1 ~ 4,
                               TRUE ~ meals),
         meals_inv = range_wna(meals_inv)) %>% 
  mutate(income_reduced = case_when(incomechangereduced == 4 ~ 1,
                                    incomechangereduced == 3 ~ 2,
                                    incomechangereduced == 2 ~ 3, 
                                    incomechangereduced == 1 ~ 4,
                                    TRUE ~ incomechangereduced),
         income_reduced = range_wna(income_reduced)) %>% 
  # you've changed all the NAs in agriactivity to 0
  mutate(agri_activity = case_when(hh_agricactivity %in% c(1, 2, 3) ~ 1, 
                                       TRUE ~ 0)) %>%
  # fies indicators     
  mutate_at(vars(fies_ranout_hhs, fies_hungry_hhs, fies_whlday_hhs),
                ~case_when(. == 4 ~ NA_real_,
                           . == 5 ~ NA_real_,
                           TRUE ~ .)) %>% 
      mutate_at(vars(fies_hungry_hhs, fies_ranout_hhs, fies_whlday_hhs),
                ~case_when(. == 3 ~ 1, 
                           . == 2 ~ 0.5, 
                           . == 1 ~ 0.25, 
                           TRUE ~ .)) %>%
  mutate(expensefood30d = case_when(expensefood30d == 11 ~ NA_real_, 
                                        expensefood30d == 12 ~ NA_real_,
                                        TRUE ~ expensefood30d)) %>% 
      mutate(expense_food = expensefood30d / 10) %>%
      # fcs indicators
  mutate(hhfcs_inv = 112 - HHFCS,
         hhfcs_inv = range_wna(hhfcs_inv),
         fcs_borderline_poor = ifelse(HHFCS <= 35, 1, 0)) %>%
  mutate_at(vars(fcsstap, fcspulse, fcsdairy, fcspr, fcsveg, fcsfruit, fcsfat, fcssugar),
                ~ range_wna(7 - .)) %>%
  rename_with(str_replace, pattern = "fcs", replacement = "fcs_", matches("fcs")) %>%  
  rename(fcs_protein = fcs_pr, fcs_staple = fcs_stap, hhfcs_inv = hhfcs__inv, 
         fcs_borderline_poor = fcs__borderline_poor) %>% 
  mutate(income_source = case_when(income_main %in% c(1, 2, 3, 7, 8, 9, 10) ~ "income_ms_agriculture",
                                   income_main == 4 ~ "income_ms_livestock",
                                   income_main == 5 ~ "income_ms_fisheries",
                                   income_main %in% c(11, 13, 14) ~ "income_ms_stable_non_ag",
                                   income_main %in% c(6, 12) ~ "income_ms_casual_non_ag",
                                   income_main %in% c(15, 16, 17, 18) ~ "income_ms_selfem_nonwork",
                                   income_main %in% c(19, 20) ~ "income_ms_no_income", 
                                   TRUE ~ "income_ms_stable_non_ag")) %>%
  left_join(fao_survey %>% 
              mutate(income_source = case_when(income_main %in% c(1, 2, 3, 7, 8, 9, 10) ~ "income_ms_agriculture",
                                   income_main == 4 ~ "income_ms_livestock",
                                   income_main == 5 ~ "income_ms_fisheries",
                                   income_main %in% c(11, 13, 14) ~ "income_ms_stable_non_ag",
                                   income_main %in% c(6, 12) ~ "income_ms_casual_non_ag",
                                   income_main %in% c(15, 16, 17, 18) ~ "income_ms_selfem_nonwork",
                                   income_main %in% c(19, 20) ~ "income_ms_no_income", 
                                   TRUE ~ "income_ms_stable_non_ag")) %>%
              select(survey_id, income_source) %>%
              mutate(value = 1) %>%
              pivot_wider(names_from = income_source, values_from = value, values_fill = 0),
            by = "survey_id") %>% 
      # imputed income_main_group value -- look at the "maybe_later" below to see how it was done
      mutate(rural = ifelse(ruralurban == 1, 1, 0)) %>% 
      # imputed hoh sex -- mode
      mutate(hoh_female = ifelse(is.na(hoh_female), 0, hoh_female)) %>% 
      # imputed fies_ranout_hhs -- mode
      mutate(fies_ranout_hhs = ifelse(is.na(fies_ranout_hhs), 1, fies_ranout_hhs)) %>%  
  mutate(shocks_none = shock_noshock,
         shocks_lostwork = shock_lostemplorwork, 
         shocks_sicknessdeath = shock_sicknessordeathofhh, 
         shocks_foodprices = shock_higherfoodprices, 
         shocks_conflict = shock_violenceinsecconf, 
         shocks_cantworkbusiness = shock_cantworkordobusiness, 
         shocks_naturalhazard = ifelse(shock_flood > 0 | shock_drought > 0 | shock_hurricane > 0 | shock_landslides > 0 |
                                         shock_firenatural > 0 | shock_othernathazard > 0 |
                                         shock_coldtemporhail > 0, 1, 0),
         shocks_accessmarket = shock_napasture, 
         shocks_pricesother = ifelse(shock_higheragrinputprice > 0 | shock_higherfuelprices > 0 | 
                                       shock_othereconomicshock > 0 | shock_lowoutputprice > 0, 1, 0), 
         shocks_other = ifelse(shock_othercropandlivests > 0 | shock_otherintrahhshock > 0 | shock_othermanmadehazard > 0 |
                                 shock_animaldisease > 0 | shock_pestoutbreak > 0 | shock_plantdisease > 0 | 
                                 shock_othercropandlivests > 0 | shock_otherintrahhshock > 0 | shock_theftofprodassets > 0 , 
                               1, 0)) %>% 
  replace_na(list(income_main_amount = 0, income_sec_amount = 0, income_third_amount = 0)) %>% 
  mutate(income_total_amount = income_main_amount + income_sec_amount + income_third_amount,
         food_expenses_mmk = income_total_amount * expense_food, 
         income_per_capita = income_total_amount / hh_size) %>%
  mutate(needs_food = ifelse(need_food > 0, 1, 0),
         needs_cash = ifelse(need_cashassistance > 0, 1, 0),
         needs_medical = ifelse(need_medicalservicesupply > 0, 1, 0),
         needs_agri = ifelse(need_fertilizers > 0 | need_pesticides > 0 | need_techsupporextensionserv > 0 |
                               need_acstomechanisedequipprod > 0|
                               need_accesstoland > 0 | need_seeds > 0 | need_landrehabilitation > 0 | 
                               need_accesstoirrigationwater > 0 | need_supportforprocessprod > 0 |
                               need_agrilabour, 1, 0),
         needs_jobsloans = ifelse(need_loans > 0 | need_jobopportunity > 0, 1, 0),
         needs_other = ifelse(need_clothesshelterwater > 0 | need_educationassistance > 0 | need_fishequiorothersuppfor > 0 |
                                need_infoonsafetymeasures > 0 | need_marketingsupport > 0 | need_other > 0 |
                                need_storageequipmentorfaci > 0 | need_tools > 0 | 
                                need_restockinganimals > 0 | need_veterinaryservices > 0 | need_animalfeed > 0 |
                                need_animalsalemingarantdprice > 0 | need_supptransofanimalsorprod > 0 |
                                need_specialnutritionfood > 0, 1, 0)) %>% 
  replace_na(list(needs_food = 0, needs_cash = 0, needs_medical = 0, needs_agri = 0, 
                  needs_jobsloans = 0, needs_other = 0)) %>% 
  select(-c(shock_otherspecify, covid_otherspecify,shock_ref, covid_ref, shock_dk, covid_dk, 
            crp_proddif_otherspecify, crp_saledif_otherspecify, ls_num_inc_dec_otherspecify, 
            ls_food_supply_otherspecify, ls_proddif_otherspecify, need_dk, need_otherspecify,
            ls_salesdif_otherspecify, fish_saledif_otherspecify, reasonsmigration_otherspecify,
            adm3_ayeyarwady, adm3_chin, adm3_kachin, adm3_kayah, adm3_kayah, adm3_kayin,
            adm3_rakhine, adm3_shan_east, adm3_shan_north, adm3_yangon, adm3_mon)) %>%  
  mutate(food_expenses_per_capita = food_expenses_mmk / hh_size / 3, 
         meb_pc = food_expenses_per_capita / (190555 / 5)) %>% 
  
  mutate(agri_hhd = ifelse(income_ms_agriculture == 1 | income_ms_fisheries == 1 | income_ms_livestock == 1 |
                                      agri_activity == 1 | income_sec <= 10, 1, 0),
         agri_main = ifelse(income_ms_agriculture == 1 | income_ms_fisheries == 1 |
                              income_ms_fisheries == 1 | agri_activity == 1, 1, 0)) %>% 
  rename(children_0_4 = hhsize04)

# fs_score mdepriv
fs_share <- survey %>%  
  select(survey_id, hhfcs_inv, fies_raw, csi_weighted) %>% 
  mutate_at(vars(hhfcs_inv, fies_raw, csi_weighted), scale) %>% 
  mutate_at(vars(hhfcs_inv, fies_raw, csi_weighted), range_wna) %>% 
  mdepriv(c("hhfcs_inv", "fies_raw", "csi_weighted"), method = "bv", output ="all")

# fs_score3 mdepriv
fs_share2 <- survey %>% 
  mutate_at(vars(cs_crisis_sold_prod_assets, cs_crisis_no_school, cs_crisis_reduced_health_exp), ~ . * 0.5) %>% 
  select(survey_id, fcs_staple, fcs_pulse, fcs_dairy, fcs_protein, fcs_veg, fcs_fruit, fcs_fat, fcs_sugar, 
         cs_crisis_sold_prod_assets, cs_crisis_no_school, cs_crisis_reduced_health_exp, 
         cs_emergency_sold_house, cs_emergency_hh_migration, cs_emergency_hh_risk,
         fies_fewfood, fies_skipped, fies_whlday, fies_hungry, fies_ateless, fies_healthy,
                 fies_runout, fies_worried) %>% 
  mdepriv(c("fcs_staple", "fcs_pulse", "fcs_dairy", "fcs_protein", "fcs_veg", "fcs_fruit", "fcs_fat", "fcs_sugar", 
         "cs_crisis_sold_prod_assets", "cs_crisis_no_school", "cs_crisis_reduced_health_exp", 
         "cs_emergency_sold_house", "cs_emergency_hh_migration", "cs_emergency_hh_risk",
         "fies_fewfood", "fies_skipped", "fies_whlday", "fies_hungry", "fies_ateless", "fies_healthy",
         "fies_runout", "fies_worried"), output = "all")

fs_share3 <- survey %>% 
  mutate_at(vars(cs_crisis_sold_prod_assets, cs_crisis_no_school, cs_crisis_reduced_health_exp), ~ . * 0.5) %>% 
  select(survey_id, fcs_staple, fcs_pulse, fcs_dairy, fcs_protein, fcs_veg, fcs_fruit, fcs_fat, fcs_sugar, 
         cs_crisis_sold_prod_assets, cs_crisis_no_school, cs_crisis_reduced_health_exp, 
         cs_emergency_sold_house, cs_emergency_hh_migration, cs_emergency_hh_risk,
         fies_fewfood, fies_skipped, fies_whlday, fies_hungry, fies_ateless, fies_healthy,
                 fies_runout, fies_worried, combined_wt) %>% 
  mdepriv(c("fcs_staple", "fcs_pulse", "fcs_dairy", "fcs_protein", "fcs_veg", "fcs_fruit", "fcs_fat", "fcs_sugar", 
         "cs_crisis_sold_prod_assets", "cs_crisis_no_school", "cs_crisis_reduced_health_exp", 
         "cs_emergency_sold_house", "cs_emergency_hh_migration", "cs_emergency_hh_risk",
         "fies_fewfood", "fies_skipped", "fies_whlday", "fies_hungry", "fies_ateless", "fies_healthy",
         "fies_runout", "fies_worried"), output = "all", sampling_weights = "combined_wt")

fs_share4 <- survey %>% 
  mutate_at(vars(cs_crisis_sold_prod_assets, cs_crisis_no_school, cs_crisis_reduced_health_exp), ~ . * 0.5) %>% 
  select(survey_id, hhfcs_inv, 
         cs_crisis_sold_prod_assets, cs_crisis_no_school, cs_crisis_reduced_health_exp, 
         cs_emergency_sold_house, cs_emergency_hh_migration, cs_emergency_hh_risk,
         fies_fewfood, fies_skipped, fies_whlday, fies_hungry, fies_ateless, fies_healthy,
                 fies_runout, fies_worried, combined_wt) %>% 
  mdepriv(c("hhfcs_inv", 
         "cs_crisis_sold_prod_assets", "cs_crisis_no_school", "cs_crisis_reduced_health_exp", 
         "cs_emergency_sold_house", "cs_emergency_hh_migration", "cs_emergency_hh_risk",
         "fies_fewfood", "fies_skipped", "fies_whlday", "fies_hungry", "fies_ateless", "fies_healthy",
         "fies_runout", "fies_worried"), output = "all", sampling_weights = "combined_wt")

# priority scores
survey <- survey %>%
  left_join(fs_share$data %>% 
              select(score_i, survey_id), by = "survey_id") %>% 
  left_join(fs_share2$data %>% 
              select(fs_score2 = score_i, survey_id), by = "survey_id") %>% 
  left_join(fs_share3$data %>% 
              select(fs_score3 = score_i, survey_id), by = "survey_id") %>% 
  left_join(fs_share4$data %>%  
              select(fs_score4 = score_i, survey_id), by = "survey_id") %>% 
   mutate(priority1 = case_when(hhfcs_inv >= mean(hhfcs_inv) & 
                                    fies_raw_range >= mean(fies_raw_range) & csi_weighted >= mean(csi_weighted) ~ 1,
                               TRUE ~ 0)) %>%
  mutate(priority2 = ifelse(score_i >= mean(score_i), 1, 0)) %>% 
  mutate(priority3 = case_when(hhfcs_inv >= mean(hhfcs_inv) & fies_raw_range >= mean(fies_raw_range) ~ 1, 
                               hhfcs_inv >= mean(hhfcs_inv) & csi_weighted >= mean(csi_weighted) ~ 1, 
                               fies_raw_range >= mean(fies_raw_range) & csi_weighted >= mean(csi_weighted) ~ 1, 
                               TRUE ~ 0)) %>% 
  mutate(priority = case_when(hhfcs_inv >= quantile(hhfcs_inv, probs = 0.75) & 
                                 fies_raw_range >= quantile(fies_raw_range, probs = 0.75) ~ 1, 
                               hhfcs_inv >= quantile(hhfcs_inv, probs = 0.75) & 
                                 csi_weighted >= quantile(csi_weighted, probs = 0.75) ~ 1,
                              fies_raw_range >= quantile(fies_raw_range, probs = 0.75) & 
                                 csi_weighted >= quantile(csi_weighted, probs = 0.75) ~ 1, 
                               TRUE ~ 0)) %>% 
  mutate(priority5 = case_when(hhfcs_inv >= quantile(hhfcs_inv, probs = 0.75) ~ 1, 
                               fies_raw_range >= quantile(fies_raw_range, probs = 0.75) ~ 1,
                               csi_weighted >= quantile(csi_weighted, probs = 0.75) ~ 1, 
                               TRUE ~ 0)) %>% 
  rename(fs_score = score_i)

# survey long
survey_long <- survey %>%
  select(survey_id, priority, fs_score, fs_score2, fs_score3, fs_score4, meb_pc, 
         contains("cs_"), contains("csi_"), contains("fies"), contains("fcs_"), contains("needs_"), 
         hhfcs_inv, need, combined_wt, 
         hoh_female, children_0_4, hh_size, hhdisability, 
         rural, expense_food, not_improved_sanitation, income_main_comp, agri_activity,
         lhcsi_max, lhcsi_range, accessmarket) %>% 
  select(-contains("wt"), -notefcs_, -lhcsi_max_ag) %>% 
  left_join(survey %>% select(survey_id, income_source, hoh_education, hoh_age, residence_type) %>% 
              pivot_longer(cols = -survey_id, names_to = "old_var", values_to = "var") %>%
              mutate(value = 1) %>% 
              select(-old_var) %>% 
              pivot_wider(names_from = var, values_from = value, values_fill = 0), by = "survey_id") %>%
  pivot_longer(cols = -survey_id, names_to = "var", values_to = "value") %>% 
  rbind(survey %>% select(contains("shocks_"), survey_id) %>%
          pivot_longer(cols = -survey_id, names_to = "var", values_to = "value"))

```


## Introduction 

Prioritisation can largely be broken into two steps -- geographic prioritisation and beneficiary selection. 

This document examines data from the FAO-WFP Shocks, Agricultural Livelihoods and Food Security Survey, the Armed Conflict Location and Event Dataset (ACLED) and the Myanmar Information Management Unit (MIMU). 

The first section deals with the development of a beneficiary profile using the FAO-WFP survey data to inform beneficiary selection. The second section makes use of ACLED data for Myanmar to develop a conflict score for township prioritisation. Township and beneficiary selection criteria are linked by a decision tree. The third section presents flood risks by township -- these are only probabilities and serve as references for prepositioning and disaster risk reduction. They not included in the overall prioritisation process, though this might change if disasters occur. This document is closed by a section on technical notes. 

<br>


### References for this report
* ACLED (2022). ACLED data for Myanmar (2010-2022). https://acleddata.com.
* Maria Noel Pi Alperin, Philippe Van Kerm (2009). mdepriv: Synthetic indicators of multiple deprivation. Stata command. http://medim.ceps.lu/stata/mdepriv_v3.pdf
* Atillio Benini, Aldo Benini (2021). mdepriv: Synthetic scores of multiple deprivation. R package version 0.0.3.  https://github.com/a-benini/mdepriv/.
* Gianni Betti, Vijay Verma (2004). A methodology for the study of multi-dimensional and longitudinal aspects of poverty and deprivation. Dipartamento di Metodi Quantitativi, Universita di Siena. http://repec.deps.unisi.it/quaderni/49DMQ.pdf. 
* FAO and WFP (2022). Myanmar | Shocks, agricultural livelihoods and food security: Monitoring report, March 2022. Rome. 
* Food Security Cluster, Myanmar (2021). 5Ws reporting tool.
* Food Security Cluster, Myanmar (2022). Understanding Conflict Dynamics in Myanmar through Conflict and Incident Data: A Food Security Perspective. https://food-security-cluster-myanmar.github.io/exploratory-data-analysis-acled-fsc/.
* HARP-F and MIMU (2018). Vulnerability in Myanmar: A Secondary Data Review of Needs, Coverage and Gaps. http://themimu.info/vulnerability-in-myanmar.
* IFPRI (2022). Household Welfare in Myanmar: Results from the Myanmar Household Welfare Survey. 
* MIMU (2022). Climate, Environmental Degradation and Disaster Risk in Myanmar. 
* UNHCR (2022). Pre and Post 1 Feb 2021 IDPs Population by Township level. 
* WFP Nigeria Country Office (2018). Standard Operating Procedure: Beneficiary Targeting in North Eastern Nigeria. Directive: NCO/PU/2019/04. 

<br><br><br>

## 1. Beneficiary profile, using FAO-WFP survey data

A total of `r nrow(survey) %>% format(big.mark = ",")` households were interviewed in the FAO-WFP Shocks, Agricultural Livelihoods and Food Security Survey in Ayeyarwady, Chin, Kachin, Kayah, Kayin, Mon, Rakhine, Shan and Yangon to monitor food security and livelihoods of persons in Myanmar. Some of the most key data collected were on the various food security indices -- the Food Consumption Score (FCS), the Food Insecurity Experience Scale (FIES) and the Livelihood Coping Strategies Index (CSI). Much more detail can be gleaned from the report itself -- this document focuses only on the aspects of the survey which can be used for prioritisation. Information was also collected on basic household demographics as well on income sources.

 
<br><br>

### 1.1 Overview of the food security indicators

A composite of the three indices, the Food Insecurity Score has also been calculated. The construction of this composite score is detailed in section 4.2. But going forward, this document will use the Food Insecurity Score as a proxy for food insecurity itself. 

The plots below ranks variables in order of their ability to predict the various food insecurity indices. Only environmental or demographic variables were taken into account as it would be redundant to predict food security indices using the components of said indices. Only responses that occurred more than 200 times (out of 2,708) in the survey were considered. 

Variables that are more red are most likely to result in high food insecurity scores (that is, higher food insecurity) and variables that are more blue are more likely to result in lower food insecurity scores. The dotted red line down the middle at $x=0$ splits variables into whether the predict higher food insecurity (positive numbers) or lower food insecurity (negative numbers). 

For all these plots, the scores of the food security indices have all been scaled between 0 and 1, with higher scores indicating greater food insecurity. The coefficient estimates below show how much a positive response on these variables would increase or decrease the score on each of the indices. 

<br>

```{r coefficient-plot-priority, fig.height=8.5}
lm_prep_long <- function(tbl) {
  tbl %>% 
    group_by(var) %>% 
  mutate(count = ifelse(value > 0, 1, 0),
         count = sum(count, na.rm = TRUE)) %>% 
  ungroup() %>% 
  filter(count > 200) %>% # adjust the threshold n
  select(-count) %>% 
  pivot_wider(names_from = var, values_from = value, values_fn = max) %>% 
  select(-survey_id, -shocks_none)
}

plot_coef <- function(tbl) {
  tbl %>%  
    tidy(conf.int = TRUE) %>%
    filter(term != "(Intercept)") %>% 
    mutate(term = str_replace(term, "income_ms", "income_source"),
           term = str_replace(term, "hoh_age", ""),
           term = str_replace(term, "hoh_education", ""), 
           term = fct_reorder(term, estimate)) %>% 
    ggplot(aes(x = estimate, y = term, colour = estimate)) + 
    geom_vline(xintercept = 0, lty = 2, colour = "red") +
    geom_point() + 
    geom_errorbarh(aes(xmin = conf.low, xmax = conf.high), height = 0.2) +
    scale_colour_viridis(option = "turbo") +
    theme(legend.position = "none",
          axis.text.y = element_text(size = 8.5),
          axis.title.x = element_text(size = 7, face = "bold"),
          plot.title = element_text(size = 9)) +
    labs(x = "Estimate", y = "")
}

survey_long %>%  
  filter(str_detect(var, "shocks_|hoh_|rural|expense_food|not_improved|agri_activity|income_ms_|edu_|size|children|disability") |
           var %in% c("fs_score4", "expense_food")) %>% 
  lm_prep_long() %>% 
  mutate(fs_score4 = log(fs_score4)) %>% 
  mutate(fs_score4 = ifelse(is.infinite(fs_score4), 0, fs_score4)) %>% 
  lm(fs_score4 ~ ., data = .) %>% 
  plot_coef() + 
  labs(title = "Food insecurity score") + 
  
survey_long %>%
        filter(str_detect(var, "shocks_|hoh_|rural|expense_food|not_improved|agri_activity|income_ms_|edu_|size|children|disability") |
         var %in% c("csi_weighted", "expense_food")) %>%
  lm_prep_long() %>%
  lm(csi_weighted ~ ., data = .) %>%
  plot_coef() + 
  labs(title = "Coping strategies index") + 
  
survey_long %>%
  filter(str_detect(var, "shocks_|hoh_|rural|expense_food|not_improved|agri_activity|income_ms_|edu_|size|children|disability") |
           var %in% c("hhfcs_inv", "expense_food")) %>%
  lm_prep_long() %>%
  lm(hhfcs_inv ~ ., data = .) %>%
  plot_coef() + 
  labs(title = "Food consumption score") + 

survey_long %>%
  filter(str_detect(var, "shocks_|hoh_|rural|expense_food|not_improved|agri_activity|income_ms_|edu_|size|children|disability") |
           var %in% c("fies_raw_range", "expense_food")) %>%
  lm_prep_long() %>%
  lm(fies_raw_range ~ ., data = .) %>%
  plot_coef() + 
  labs(title = "Food insecurity experience scale")
```

<br>

The variables that tend to to contribute the most to high food insecurity scores are a household's exposure to conflict and natural hazards, whether or not they lost employment or work opportunities, whether or not they were engaged in casual non-agricultural labour and whether or not they reported no income in the three months prior to the survey. 

These variables can be thought of as a basic beneficiary profile that can inform beneficiary selection criteria for food security and livelihoods programming. This list is, of course, non-exhaustive as it is limited to only the variables that were collected as part of the FAO-WFP survey.

The variables that were the best predictors of low food insecurity were whether or not the head of household had completed higher education, whether or not the main income source of the household was stable non-agricultural employment. 

<br><br>

### 1.2 Setting a threshold for prioritisation 

An important step is narrowing down the population to a priority group -- Food Security Cluster partners do not have the resources to target all persons.

>A household would be considered a priority for intervention when above the 75th percentile in at least two out of the three food security indices (food consumption score, coping strategies index and food insecurity experience scale)

Approximately 34% of the sampled population falls into this priority group. The plots below shows the differences between the average values of food security index components based on whether they are priority households or not. Variables in blue correspond to Coping Strategies Index indicators, those in red to the Food Insecurity Experience Scale. 

Those in green correspond to the Food Consumption Score. For these indicators, they have been rescaled so that a value of 1 shows that a household has not consumed that food group at all in the previous 7 days and a value of 0 shows that they have consumed that food group every day for the past 7 days. 

<br>

```{r facet-priority-group-comparison}
survey %>% select(contains("fies"), contains("fcs_"), contains("cs_emergency"), contains("cs_crisis"), 
                  priority) %>% 
  select(-fies_raw, -fies_raw_range, -notefcs_, -hhfcs_inv, -fcs_borderline_poor, -contains("wt"), -contains("hhs")) %>%
  drop_na() %>% 
  pivot_longer(cols = -priority, names_to = "var", values_to = "value") %>% 
  group_by(var, priority) %>% 
  summarise(mean = mean(value)) %>% 
  arrange(desc(mean)) %>% 
  mutate(group = case_when(str_detect(var, "fcs") ~ "fcs",
                           str_detect(var, "cs_") ~ "csi", 
                           str_detect(var, "fies") ~ "fies"),
         var = reorder_within(var, by = mean, within = group), 
         priority = recode(priority, `0` = "not_priority", `1` = "priority")) %>% 
  ggplot(aes(x = mean, y = reorder_within(var, by = mean, within = group), fill = group)) + 
  geom_col() +
  geom_text(aes(label = scales::percent(round(mean, digits = 2))), size = 3, hjust = -0.1) + 
  labs(x = "Mean", y = "", title = "Prevalence of food insecurity in the surveyed population") + 
  theme(legend.position = "none", 
        axis.text.y = element_text(size = 8),
        axis.title.x = element_text(size = 8, face = "bold"), 
        plot.title = element_text(size = 10)) +
  expand_limits(x = 1.05) +
  facet_wrap(~priority) + 
  scale_y_reordered()
```

<br> 

Whilst both groups tend to consume low amounts of dairy, pulses, sugar and fruits, these trends are much more pronounced in the priority group. Additionally, households in the priority group are more than twice as likely to have poor protein consumption, be worried about not having enough food and eat less due to a lack of resources. 

Households in the priority group were also more than four times as likely to have reduced healthcare expenditures, have sold productive assets and be unable to eat healthy and nutritious foods and eat only a few kinds of foods. They are also more than five times as likely to skipped meals or gone hungry. 

<br> 

### 1.3 Variables considered in the beneficiary profile

The interactive reference table below shows the various environmental and demographic variables collected by the FAO-WFP survey. The first column lists the variable name, the second and third show the mean values for each variable based on households inside and outside of the priority group -- for instance, from the table, it can be seen that 2.96% of households within the priority group do not have improved sanitation whereas for households outside of the priority group, this percentage is 1.42%. 

The fourth column shows the percentage difference between households in the priority group and households outside of it. To continue the example, households in the priority group are 107.55% more likely to not have access to improved sanitation than households outside of the priority group. The table is sorted in descending order based on this column. 

The last column is the percentage of the surveyed population that the variable corresponds to. So, in spite of households in the priority group having much lower access to improved sanitation, this variable only applies to 2.23% of the total population, meaning that whilst this indicator might be very predictive of households in the priority group, using it on its own would lead to high exclusion errors as it only applies to a small proportion of the population. Sorting the table by **`%_sample`**. 

<br>


```{r dt-comparison-demographic-environmental}

js <- c(
  "function(settings){",
  "  var datatable = settings.oInstance.api();",
  "  var table = datatable.table().node();",
  "  var caption = 'hh_size shows the population mean in the %_sample column'",
  "  $(table).append('<caption style=\"caption-side: bottom\">' + caption + '</caption>');",
  "}"
)

survey %>% 
  mutate(children_0_4 = ifelse(children_0_4 > 0, 1, 0)) %>% 
  select(matches("shocks_|hoh_female|rural|not_improved|agri_activity|income_ms_|size|children|disability"), survey_id) %>% 
  select(-shocks_none, -crp_landsize, -ruralurban) %>% 
  left_join(survey %>% select(hoh_education, hoh_age, residence_type, survey_id) %>% 
              pivot_longer(cols = -survey_id, names_to = "old_var", values_to = "var") %>%
              mutate(value = 1) %>% 
              select(-old_var) %>% 
              pivot_wider(names_from = var, values_from = value, values_fill = 0), by = "survey_id") %>% 
  select(-`NA`) %>%  
  left_join(survey %>% select(priority, survey_id), by = "survey_id") %>% 
  pivot_longer(cols = c(-survey_id, -priority), names_to = "var", values_to = "value") %>% 
  mutate(count = 1) %>%  
  group_by(var, priority) %>% 
  summarise(mean = mean(value)) %>% 
  mutate(priority = recode(priority, `1` = "priority", `0` = "not_priority")) %>% 
  pivot_wider(names_from = priority, values_from = mean) %>% 
  mutate(diff_pc = round((priority - not_priority) / not_priority * 100, digits = 2)) %>%
  left_join(survey_long %>%
              left_join(survey %>% select(combined_wt, survey_id), by = "survey_id") %>%
              mutate(count = ifelse(value > 0, 1, 0)) %>%
              filter(count == 1) %>% 
              group_by(var) %>% 
              summarise(count = sum(combined_wt)), by = "var") %>% 
  mutate(pc_sample = round(count / sum(survey$combined_wt) * 100, digits = 2)) %>% 
  mutate_at(vars(priority, not_priority), ~round(.x * 100, digits = 2)) %>%  
  mutate(pc_sample = case_when(var == "children_0_4" ~ round(sum(survey[which(survey$children_0_4 > 0), 331]) / 
                                                              sum(survey$combined_wt) * 100, digits = 2), 
                                var == "hh_size" ~ 5.10, 
                                TRUE ~ pc_sample), 
         var = str_replace(var, "ms", "source")) %>%  
  arrange(desc(diff_pc)) %>%
  select(variable = var, mean_priority = priority, mean_not_priority = not_priority, 
         `%_difference` = diff_pc, `%_sample` = pc_sample) %>% 
  datatable(filter = list(position = "top", clear = FALSE),
            options = list(pageLength = 10, scrollX = TRUE,
              initComplete = JS(js),#This will avoid looping of the caption when there is pagination
              headerCallback = DT::JS("function(thead) {", 
                          "  $(thead).css('font-size', '.8em');",
                          "}")),
  caption = htmltools::tags$caption(style = "caption-side: top; color: black; 
                                              text-align: center; font-size: 130%;",
                                              "Possible component of a beneficiary profile"), 
  ) %>% 
  formatStyle(0, target = "row", lineHeight = "80%", fontSize = "80%")

```

<br>

As expected, given that the priority grouping is largely based on the Food Consumption Score, Coping Strategies Index and the Food Insecurity Experience Scale, the variables that are least likely (as can be seen by their negative **`%_difference`**) to be associated with households in the priority group are the household head having completed higher education, having a stable source of non-agricultural income, having the main income source be from livestock and having self-employment or non-work (rents etc.) being the main income source. 

Partners are encouraged to make use of the list of variables above, in addition to community perspectives and their own expert judgement, to formulate the beneficiary selection criteria for their food security programmes. The next section makes use of decision trees to find which of these variables are the most useful in splitting the population into households within the priority group and those outside of it. 


<br><br>

### 1.4 Decision trees

```{r df-tree-rules}

tree4 <- survey %>% 
  rpart(priority ~  shocks_lostwork + shocks_sicknessdeath + shocks_foodprices + shocks_conflict + shocks_cantworkbusiness +
          shocks_naturalhazard + shocks_accessmarket + shocks_pricesother + shocks_other + rural + not_improved_sanitation,
        data = .,  weights = combined_wt , cp = 0.01)

tree6 <- survey %>% 
  mutate(children_0_4 = ifelse(children_0_4 > 0, 1, 0)) %>% 
  rpart(priority ~ children_0_4 + hoh_female + hoh_education + hoh_age, data = ., weights = combined_wt, cp = .0088)

survey <- survey %>% 
  mutate(rule4 = row.names(tree4$frame)[tree4$where]) %>% 
  left_join(rpart.rules.table(tree4) %>% 
              filter(Leaf == TRUE) %>% 
              rename(rule4 = Rule) %>% 
              group_by(rule4) %>% 
              summarise(subrules4 = paste(Subrule, collapse = ",")), by = "rule4") %>% 
  mutate(rule6 = row.names(tree6$frame)[tree6$where]) %>% 
  left_join(rpart.rules.table(tree6) %>% 
              filter(Leaf == TRUE) %>% 
              rename(rule6 = Rule) %>% 
              group_by(rule6) %>% 
              summarise(subrules6 = paste(Subrule, collapse = ",")), by = "rule6")


```


This section focuses on the creation of simple tools to aid in the identification of households in the priority group. This involves narrowing down the list of 36 variables in the reference table above to develop two simple decision trees, one focused on a household's environment and circumstances and the other on a household's demographics. 

Both these trees have been kept simple as making them deeper (i.e. by including additional variables and more nodes) provides very limited gains in exchange for a lot of added complexity. Additionally, as decision trees grow larger, they get much less interpretable. 

Each tree starts at the root, node *[1]*  in this case, which contains the entire population. From there, it branches out based on the various conditions at each node until it terminates in four groups (4, 5, 6 and 7 in the tree below). Within each blue square, the figure in **bold** show the **percentage of each group that are priority households** and the figures in *italics* show the *percentage of the overall population* in each node. 

The first tree, dealing with a household's environment and circumstances is comprised of three splits -- whether a household is in a rural area, whether they have experienced violence, insecurity or conflict and whether they have lost work or employment opportunities. 

<br>

**Decision tree -- environment and circumstances**
![](decision_tree.png)
<br>

The percentage of priority households in rural areas is much higher than in urban ones (37.6% vs 26.1%). For rural areas, the indicator that best distinguishes priority households is whether or they have directly experienced violence, insecurity or conflict in the past three months. The factor that best distinguishes priority households in urban areas is whether or not they have lost work or employment opportunities in the past three months.  

The performance of the decision tree is not exactly spectacular -- if a partner wanted to target an entire community, the threshold for blanket coverage would be around 75% or 80% of households being in the priority group, which none of the terminal nodes achieve. But the tree is very understandable at least provides some direction for identifying beneficiaries at the community level once geographic prioritisation has been conducted (which is the topic of section 2). 

These findings are also consistent with IFPRI's survey on household welfare where rural households were found to be more asset-poor and less resilient than urban ones. However, much more work remains to be done on triangulating the FAO-WFP survey with the IFPRI one, which was much more comprehensive in scope and scale. 

The second tree below makes use of demographic variables (age of head of household, whether or not the head of the household is female etc.). These have been treated separately from the environmental and circumstantial variables above as combining them led to trees that did not make sense i.e. whether or not a household has children under 5 should have no bearing on whether or not they lost access to markets in the past three months. Both trees start from the same root i.e. 100% of the population. 

In the second tree below, which splits the population into three terminal nodes (B, D and E), households whose heads have not completed secondary and/or higher education *and* have children under 5 years of age are most likely to fall into the priority group (46.8%). Households whose heads have competed secondary and/or higher education are least likely to fall into the priority group, irrespective of the age of their children or even if they have children.  

<br>

**Decision tree -- demographics**
![](decision_tree_demographics.png)
<br>

These decision trees have been constructed not to direct decisions in partners' programmes, but more to inform them, and should not be treated as absolutes -- there are a wide variety of contexts in Myanmar and each community will likely have additional relevant criteria. 

But the application of the conditions set in this tree and the first one (in blue) will lead to much lower inclusion and exclusion errors in beneficiary selection. The Food Security Cluster remains open to more data that will help improve our collective understanding of food insecurity in Myanmar. 


<br><br>

### 1.5 Exceptions for targeting agricultural households

Returning momentarily to the first tree on environmental and circumstantial variables (in blue), 64.7% of the total population is in the tree node *[6]* (the nodes are all numbered in the decision tree above) -- within this group are households who reside in rural areas but have not been affected by violence, conflict or insecurity. Given that this is a significant chunk of the population where, with reference to the plot below, 78% of the population is involved in agriculture, an exception can be made in this one instance, to support partners who might want to select beneficiaries within the large group of agricultural households who have not been affected by conflict.  

It must still be mentioned that aid must first flow to the households in node *[7]*, where the proportion of households in the priority group is highest. Only after the needs of this population are met should households in node *[6]* be targeted. 

<br>

```{r barplot-pc-agri}
survey %>% 
  mutate(agri_hhd = agri_hhd * combined_wt) %>% 
  group_by(rule4) %>% 
  summarise(sum = sum(agri_hhd), 
            combined_wt = sum(combined_wt)) %>% 
  mutate(pc_ag = sum / combined_wt, 
         pc = combined_wt / sum(combined_wt)) %>% 
  mutate(rule4 = fct_rev(rule4)) %>% 
  ggplot(aes(x = pc, y = rule4, fill = pc_ag)) + 
  geom_col() +
  geom_text(aes(label = scales::percent(round(pc_ag, digits = 3))),
            hjust = -0.1) + 
  scale_fill_viridis(option = "mako", direction = -1, begin = 0.15) +   
  scale_x_continuous(labels = percent_format(accuracy = 1)) +
  labs(x = "Percent of total population", 
       y = "Tree node",
       title = "Share of agricultural households by tree node", 
       fill = "% agri\nhhd") + 
  theme(axis.text.y = element_text(size = 12)) + 
  expand_limits(x = .68)
```

<br> 

The tree below carries on from node *[6]* of the main decision tree on environmental and circumstantial variables. Within this group of rural households not affected by conflict (64.7% of the total population), identifying households who have lost access to markets due to conflict, infrastructure damage or other access issues and households who have been affected by natural hazards (the most common of which is flooding) improves the chances of targeting households in the priority group to slightly above 50%. 

Any further conditions applied on the tree only result in extremely marginal gains. 

<br>

**Decision tree -- from tree node [6]**
![](agricultural_exception.png)

<br>

To further inform the selection of beneficiaries in these areas, the plot below shows the coefficients of environmental and demographic variables on food insecurity within the households in node *[6]*. The more positive (more red) the coefficient, the higher the impact on food insecurity.  

The variables within the FAO-WFP survey most of use to partners targeting priority households in node *[6]* are whether or not a household has improved sanitation and the educational attainment of the head of the household. The most important shocks for this group were natural hazards and the loss of work or employment opportunities. 

<br>

```{r coef-plot-node6}
survey %>% filter(rule4 == 6) %>% 
  select(contains("shocks_"), not_improved_sanitation, contains("income_ms_"),
         hoh_education, hoh_female, hoh_age, fs_score4, children_0_4) %>% 
  select(-c(shocks_conflict, income_ms_fisheries)) %>% 
  mutate(fs_score4 = log(fs_score4)) %>% 
  mutate(fs_score4 = ifelse(is.infinite(fs_score4), 0, fs_score4)) %>% 
  lm(fs_score4 ~ ., data = .) %>% 
    tidy(conf.int = TRUE) %>%
    filter(term != "(Intercept)") %>% 
    mutate(term = str_replace(term, "income_ms", "income_source"),
           term = str_replace(term, "hoh_age", ""),
           term = str_replace(term, "hoh_education", ""), 
           term = fct_reorder(term, estimate)) %>% 
    ggplot(aes(x = estimate, y = term, colour = estimate)) + 
    geom_vline(xintercept = 0, lty = 2, colour = "red") +
    geom_point() + 
    geom_errorbarh(aes(xmin = conf.low, xmax = conf.high), height = 0.2) +
    scale_colour_viridis(option = "turbo") +
    theme(legend.position = "none",
          axis.text.y = element_text(size = 8.5),
          axis.title.x = element_text(size = 7, face = "bold")) +
    labs(x = "Estimate", y = "", 
         title = "Coefficient plot for households in tree node [6]", 
         subtitle = "Higher scores are more predictive of food insecurity")

```


<br><br>


### 1.6 Impact of conflict on food security

Conflict has a real impact on food security. Exposure to conflict in these households was linked with a `r round(filter(survey, shocks_conflict == 1) %>% {mean(.$fs_score4)} - filter(survey, shocks_conflict == 0) %>% {mean(.$fs_score4)}, digits = 3)` increase in score, which is, on average, a `r round((filter(survey, shocks_conflict == 1) %>% {mean(.$fs_score4)} - filter(survey, shocks_conflict == 0) %>% {mean(.$fs_score4)}) / filter(survey, shocks_conflict == 0) %>% {mean(.$fs_score4)} * 100, digits = 2)`% higher.

Though the FAO-WFP survey was not developed with measuring the impact of conflict in mind, it is nevertheless possible to control for some factors and examine the impact of conflict on several groups. 

The plot below shows the impact on the overall food insecurity scores based on whether a household has been impacted by conflict or not. We have selected the two largest groups within the rural and urban context (farmers and workers employed in stable, non-agricultural employment) and have split these populations by education level. 

<br>

```{r facet-conflict-impact}
conflict_impact <- function(tbl) {
  tbl %>% 
    group_by(hoh_education, shocks_conflict) %>% 
    filter(!is.na(hoh_education),
           hoh_education != "edu_religious") %>% 
    summarise(mean_score = mean(fs_score4)) %>% 
    ungroup() %>% 
    mutate(shocks_conflict = recode(shocks_conflict, `0` = "no", `1` = "yes"),
           shocks_conflict = fct_rev(shocks_conflict),
           hoh_education = fct_relevel(hoh_education,
                                       c("edu_none", "edu_primary", "edu_secondary", "edu_higher")))
}
  
survey %>% 
  filter(income_ms_stable_non_ag == 1 & rural == 0) %>%
  conflict_impact() %>% 
  mutate(category = "urban_workers") %>% 
  rbind(
    survey %>% 
      filter(income_ms_agriculture == 1 & rural == 1) %>% 
      conflict_impact() %>% 
      mutate(category = "rural_farmers")
  ) %>% 
  ggplot(aes(x = mean_score, y = hoh_education, fill = shocks_conflict)) + 
  geom_col(position = "dodge") +
  geom_text(aes(label = round(mean_score, 3)), 
            size = 3.5, hjust = "inward", 
            position = position_dodge(width = 0.9)) +
  labs(x = "Mean food insecurity score", y = "Education level, household head",
       fill = "shock:\nconflict",
       title = "Impact of conflict on food insecurity",
       subtitle = "Higher scores indicate higher food insecurity") +
  expand_limits(x = 0.16) +
  facet_wrap(~ category)

```

<br>


The food insecurity scores of all groups except one are higher when a household has experienced conflict. This by no means constitutes a scientific proof (another survey would have to be conducted), but it is possible to state with some confidence that exposure to conflict has a negative impact on food security. The small sample size likely results in widely diverging values in some cases; additionally the purpose of the FAO-WFP survey was not to understand conflict dynamics. 

The plot below further examines of the impact of conflict on food security. The estimates in the plot below show the impact of exposure to conflict on the various food security indices. As was done above, the food security indices used were re-scaled between 0 and 1, with higher scores indicating greater food insecurity. 

Notably, exposure to conflict had the largest impact on the Food Insecurity Experience Scale in rural areas. All the effects were statistically significant with the exception of conflict on Food Consumption Scores in urban contexts (that point is marked by a triangle in the plot below to denote this). 

<br>


```{r estimate-plot-fs-indices}

con_est <- function(tbl, y, x, z){
  
  y <- enquo(y)
  x <- enquo(x)
  z <- enquo(z)
  
  model_formula <- formula(paste0(quo_name(y), "~", quo_name(x)))
  
  lm(model_formula, data = tbl) %>% 
    tidy(conf.int = TRUE) %>% 
    filter(term != "(Intercept)") %>% 
    mutate(model = paste0(quo_name(y), "_", quo_name(z)))
}

rbind(
survey %>% 
  filter(rural == 1) %>% 
  con_est(hhfcs_inv, shocks_conflict, rural), 
survey %>% 
  filter(rural == 1) %>% 
  con_est(fies_raw_range, shocks_conflict, rural),
survey %>%  
  filter(rural == 1) %>% 
  con_est(csi_weighted, shocks_conflict, rural),
survey %>% 
  filter(rural == 0) %>% 
  con_est(hhfcs_inv, shocks_conflict, urban), 
survey %>% 
  filter(rural == 0) %>% 
  con_est(fies_raw_range, shocks_conflict, urban),
survey %>%  
  filter(rural == 0) %>% 
  con_est(csi_weighted, shocks_conflict, urban)
) %>% 
  mutate(significant = ifelse(p.value <= 0.05, 1, 0),
         model = recode(model, "hhfcs_inv_rural" = "FCS_rural", "hhfcs_inv_urban" = "FCS_urban",
                        "fies_raw_range_rural" = "FIES_rural", "fies_raw_range_urban" = "FIES_urban",  
                        "csi_weighted_rural" = "CSI_rural", "csi_weighted_urban" = "CSI_urban"),
         model = fct_reorder(model, estimate)) %>% 
  ggplot(aes(x = estimate, y = model, colour = estimate)) + 
  geom_vline(xintercept = 0, lty = 2, colour = "red") + 
  geom_point(aes(pch = fct_rev(factor(significant))), size = 2) + 
  geom_errorbarh(aes(xmin = conf.low, xmax = conf.high), height = 0.2) + 
  scale_colour_viridis(option = "turbo", guide = "none") + 
  labs(x = "Estimate", y = "", 
       title = "Effect of conflict on food security indices", 
       subtitle = "Split by rural/urban contexts",
       pch = "Significant") 
  
```

<br>

Approximately 6% of all the households surveyed by FAO-WFP were directly affected by conflict, violence and insecurity. However, this does discount households that have been affected by the secondary impacts of conflict, such as loss of access to markets and increased food prices. 

Whilst there is no set threshold whereby one can state that everyone in a given area has been affected by conflict, it is possible to determine where the conflict has been the most severe. In the next section, conflict scores, together with pre-existing vulnerability will be used to prioritise between townships.


<br><br><br>

## 2. Prioritisation of townships by exposure to conflict and pre-existing vulnerability

Myanmar had the most conflict events out of any country in 2021, exceeding Syria, Yemen and Afghanistan. 

<br>

```{r table-conflict-years-compare}
acled %>% 
  filter(sub_event_type != "Peaceful protest") %>% 
  mutate(years = ifelse(year %out% c(2021, 2022), "2010-2020", year)) %>%
  group_by(years) %>%
  summarise(total_events = n()) %>% 
  mutate(avg_events_per_year = ifelse(years %in% c(2021, 2022), total_events, total_events / 11)) %>% 
  left_join(acled %>%
              filter(sub_event_type != "Peaceful protest") %>% 
              mutate(years = ifelse(year %out% c(2021, 2022), "2010-2020", year)) %>%
              group_by(years) %>%
              summarise(total_fatalities = sum(fatalities)) %>% 
              mutate(avg_fatalities_per_year = ifelse(years %in% c(2021, 2022), total_fatalities, total_fatalities / 11)), 
            by = "years") %>%
  mutate_at(vars(avg_events_per_year, avg_fatalities_per_year), ~round(.x, digits = 0)) %>% 
  kable(caption = "Increase in conflict events and fatalities compared to previous decade", 
        format.args = list(big.mark = ",")) %>% 
  kable_classic_2("striped") %>% 
  footnote("Data source: ACLED; accleddata.com. Data for 2022 is until 2022-05-31.", general_title = "")

```

<br>

Given the more than fourteenfold increase in conflict events and thirteenfold increase in conflict fatalities in Myanmar in 2021 (compared to the average for the preceding 10 years), conflict will be major part of determining which areas have the greatest humanitarian needs. However, scoring townships based only on one variable would lead to a very uni-dimensional understanding of the crisis. 

Similar to how a beneficiary profile was established in order to inform beneficiary selection, this section, will focus on what variables should be considered when prioritising amongst the various geographic areas in Myanmar and how the results of such a prioritisation should be applied. As with earlier sections, the analysis here is meant to describe and inform programmatic and operational decision making, as opposed to dictating it, as each partner will have their own considerations and limitations. 

For a more comprehensive review of conflict in Myanmar, please refer to the Food Security Cluster's report [Understanding Conflict Dynamics in Myanmar through Conflict and Incident Data: A Food Security Perspective](https://food-security-cluster-myanmar.github.io/exploratory-data-analysis-acled-fsc/). And for a more comprehensive review of multidimensional vulnerabilty in Myanmar, please refer to MIMU-HARP's [Vulnerability Study](http://themimu.info/sites/themimu.info/files/documents/Report_Vulnerability_in_Myanmar_HARP-MIMU_Jun2018_ENG_Print_version.pdf).  

<br><br>

### 2.1 Conflict events and fatalities by state/region

```{r confict-state-barplot}
conflict_score %>% 
  select(admin3_pcode, battles, explosions_remote_violence, protests_and_riots, 
         strategic_developments, violence_against_civilians) %>% 
  pivot_longer(cols = c(battles:violence_against_civilians), names_to = "event_type", values_to = "value") %>% 
  left_join(fs_pin %>% select(admin3_pcode, state), by = "admin3_pcode") %>% 
  group_by(state) %>% 
  mutate(total_events = sum(value)) %>% 
  ungroup() %>% 
  mutate(state = fct_reorder(state, -total_events)) %>% 
  ungroup() %>% 
  ggplot(aes(x = state, y = value, fill = event_type)) + 
  geom_col() +
  scale_y_continuous(labels = comma) +
  theme(axis.text.x = element_text(angle = 30, vjust = 0.7, hjust = 0.7), 
        plot.caption = element_text(hjust = 0.5)) +
  labs(x = "", 
       y = "Conflict events",
       title = "Conflict events by state (Jan 2021 - May 2022)",
       caption = "Data source: ACLED; acleddata.com")

#ggsave("event_type_state.png", dpi = 300, height = 5, width = 8, units = "in")
```

<br>

Starting at the state/region level, Sagaing saw the highest number of conflict events as well as conflict as well as conflict-related fatalities in 2012. It experienced more than three times as many conflict-related fatalities than the next-highest state/region -- Magway. This is a significant shift in the pattern of conflict in Myanmar, which has traditionally revolved around Kachin, Rakhine and Shan. 

Kayah, Chin and Sagaing had the highest number of conflict fatalities per capita in 2021. 

<br>

```{r conflict-fatalities-barplot}

conflict_score %>% 
  select(admin3_pcode, fatalities) %>% 
  left_join(fs_pin %>% select(admin3_pcode, state, population_2021_proj), by = "admin3_pcode") %>% 
  group_by(state) %>% 
  summarise(fatalities = sum(fatalities), 
            pop = sum(population_2021_proj)) %>% 
  mutate(fatalities_pc = fatalities / pop) %>% 
  mutate(state = fct_reorder(state, -fatalities)) %>% 
  ggplot(aes(x = state, y = fatalities, fill = fatalities_pc)) +
  geom_col() + 
  scale_fill_viridis(option = "magma", direction = -1) +
  scale_y_continuous(labels = comma) +
  theme(axis.text.x = element_text(angle = 30, vjust = 0.7, hjust = 0.7),
        plot.caption = element_text(hjust = 0.5)) +
  labs(x = "",
       y = "Number of fatalities", 
       title = "Conflict fatalities by state (Jan 2021 - May 2022)",
       subtitle ="and fatalities per capita", 
       fill = "fatalities\nper capita",
       caption = "Data source: ACLED; acleddata.com")

# ggsave("fatalities_state.png", dpi = 300, height = 5, width = 8, units = "in")
  
```

<br><br>

### 2.2 Conflict score by township

As can be seem from the state/region barplots in the previous section, the distribution of conflict events and fatalities is not even, and is skewed towards a few states/regions. This is also evident at township level. In the scatter plot below, the averages of the number of conflict events and the number of fatalities at the township level have been marked by the dotted red lines, dividing the plot into four quadrants. 

To better discriminate amongst the numerous townships, a conflict score has been developed. A requisite for any prioritisation score or index for conflict should be the ability to distinguish, first and foremost, the townships in the upper right quadrant of the plot, which have the heaviest concentrations of conflict events and fatalities. These areas are also where, as we have established in the section on beneficiary selection, where the highest concentrations of households in the priority group will be found. For reference, 58 townships have both above-average numbers of conflict events and fatalities (upper-right quadrant) and 196 townships have both below-average numbers of conflict events and fatalities (bottom-left quadrant). 

<br>

```{r plotly-conflict-scatterplot}
conflict_scatter <- conflict_score %>% 
  mutate(total_events = battles + explosions_remote_violence + protests_and_riots + strategic_developments +
           violence_against_civilians) %>% 
  left_join(fs_pin %>% select(state, township, partners_2021, admin3_pcode, population_density), by = "admin3_pcode") %>% 
  mutate(conflict_score = round(conflict_score, digits = 3)) %>% 
  ggplot(aes(y = fatalities, x = total_events, colour = conflict_score, 
             text = paste0(township, ",", "\n",
                           state, ",", "\n",
                           "conflict_score: ", conflict_score, ",", "\n", 
                           "events: ", total_events, ",", "\n",
                           "fatalities: ", fatalities, ",", "\n",
                           "partners_2021: ", partners_2021))) +
  geom_point(aes(size = fatalities), alpha = 0.85) +
  scale_y_continuous(trans = "log10", breaks = c(0, 1, 10, 30, 100, 300, 1000)) +
  scale_x_continuous(trans = "log10", breaks = c(0, 1, 10, 30, 100, 300)) +
  scale_colour_viridis(option = "magma", direction = -1) +
  geom_hline(aes(yintercept = mean(fatalities)), colour = "red", lty = 2) +
  geom_vline(aes(xintercept = mean(total_events)), colour = "red", lty = 2) +
  labs(x = "Conflict events", 
       y = "Fatalities",
       title = "Conflict events and fatalities by township", 
       subtitle = "Means of both axes are marked by the dotted red line",
       colour = "conflict\nscore", 
       size = "fatalities",
       caption = "Data source: ACLED; acleddata.com") 

ggplotly(conflict_scatter, tooltip = c("text"), width = 820) %>% 
  # layout(showlegend = FALSE, legend = list(font = list(size = 6))) %>% 
  config(displayModeBar = FALSE) %>% 
  layout(title = list(text = paste0("Conflict events and fatalities by township",
                                    "<br>",
                                    "<sup>",
                                    "mouse over for details; means marked by red lines; fatalities marked by size","</sup>")))

# ggsave("tsp_scatter_conflict.png", dpi = 300, height = 5, width = 8, units = "in")

```

<br>

At its most basic, the conflict score is just the average of battles, explosions, remote violence, violence against civilians, strategic developments, non-peaceful protests and riots, conflict-related fatalities and IDPs (refer to section 4.5 for more details). Additional information can also be found in the FSC's report [Understanding Conflict Dynamics in Myanmar through Conflict and Incident Data: A Food Security Perspective](https://food-security-cluster-myanmar.github.io/exploratory-data-analysis-acled-fsc/). This score will be now be used as a shorthand for conflict incidence in Myanmar going forward. 

<br><br>

### 2.3 Comparison with multidimensional vulnerability

Conflict, in spite of its out-sized role in the crisis in Myanmar, cannot serve as the only determinant of geographic prioritisation. In order to consider other factors, this document will make use of the MIMU-HARP Vulnerability Index, which is comprised of 8 indicators, selected for their ability to predict the rest of the variables in the 2015 Census dataset. Several of these variables were also collected at district-level and were used to create an updated 2019 Vulnerability index. 

The specific indicators chosen were: 

* % of population without formal identification documents
* % of population without a middle school education
* % of females who were illiterate
* % of households with bamboo or thatched roofs
* % of households with safe sanitation
* % of households with access to electricity
* % of workers who are unpaid family workers
* The child dependency ratio
* The conflict index (reflecting battles, fatalities, violence against civilians and displacement)

These indicators were combined to construct a township-level vulnerability index. Of the indicators above, only the percentage of the population without a middle-school education and percentage of the population without formal identification documents were not collected in the 2019 Inter-Censal survey. 

For the remaining indicators, they were bootstrapped using their 2015 township values and the 2019 district-level data. Going forward, the first seven components of the index will be extracted and treated separately as a shorthand for pre-existing multidimensional vulnerability in Myanmar. 

However, the eighth component of the vulnerability index, the conflict index, first constructed in 2015 from ACLED data, can and has been updated with more recent data -- this was done in the previous section. This next few sections will examine the relationship between underdevelopment and the updated 2021 conflict score and the resulting implications on geographic prioritisation for the Food Security Cluster. 

Firstly, from the scatter plot below, underdevelopment and pre-existing vulnerability and conflict are not good predictors of each other. The points in the plot below, each representing a township, split largely across two arcs -- with townships with high conflict scores largely falling around the median for multidimensional vulnerability and townships with very high multidimensional vulnerability tending to have lower conflict scores. 

<br>

```{r plotly-mdp-scatter}
mdp_scatter <- fs_pin %>%  
  mutate(total_events = battles + explosions_remote_violence + protests_and_riots + strategic_developments +
           violence_against_civilians) %>%
  filter(conflict_score != 0) %>%
  ggplot(aes(x = mdp_adjust, y = conflict_score, colour = conflict_score, 
             text = paste0(township, ",", "\n",
                           state, ",", "\n",
                           "events: ", total_events, ",", "\n",
                           "fatalities: ", fatalities, ",", "\n",
                           "conflict_score: ", round(conflict_score, digits = 3), ",", "\n", 
                           "md_vulnerability: ", round(mdp_adjust, digits = 3), ",", "\n", 
                           "partners_2021: ", partners_2021))) + 
  geom_hline(aes(yintercept = median(conflict_score)), colour = "red", lty = 2) + 
  geom_vline(aes(xintercept = median(mdp_adjust)), colour = "red", lty = 2) +
  geom_point(aes(size = population_2021_proj), alpha = .7) +
  scale_colour_viridis(option = "magma", direction = -1) + 
  labs(x = "Multidimensional vulnerability (2015)", y = "Conflict score (2021)", colour = "conflict\nscore", 
       title = "Multidimensional vulnerability and conflict by township", 
       subtitle = "Averages marked by red lines; population marked by size")

ggplotly(mdp_scatter, tooltip = c("text"), width = 820, height = 550) %>% 
  # layout(showlegend = FALSE, legend = list(font = list(size = 6))) %>% 
  config(displayModeBar = FALSE) %>% 
  layout(title = list(text = paste0("Multidimensional vulnerability and conflict by township",
                                    "<br>",
                                    "<sup>",
                                    "Averages marked by red lines; population marked by size","</sup>")))




```

<br>

This is a reflection of the shift in conflict away from frontier areas to cities and towns that have more strategic targets. The broader involvement of the Bamar majority in the conflict has likely also contributed to this shift -- prior to the February 2021 coup, the conflict was largely between Ethnic Armed Organisations and the Myanmar Armed Forces. However, combatants are now spread throughout the populations, especially with the proliferation of the People's Defence Forces. 

It has been [observed](https://www.irrawaddy.com/news/burma/kachin-independence-army-pdfs-attack-myanmar-junta-bases-in-kachin-state.html) that Ethnic Armed Organisations, such as the Kachin Independence Army, are working closely with the People's Defence Forces, and are training and arming them. This has contributed to the front being moved further forward, outside of the areas that had been traditionally most affected by conflict.

<br><br>

### 2.4 2015 vulnerability bands

This shift in conflict patterns can be further examined by considering the vulnerability bands developed by MIMU-HARP alongside its vulnerability index. In addition to an overall vulnerability score per township, 8 main typologies of vulnerability were also constructed to illustrate the wide variation of contexts and needs in the different parts of the country as well as to group together similar townships so that they may be considered as separate programming blocs. Please refer to the [MIMU-HARP Vulnerability Study](http://themimu.info/sites/themimu.info/files/documents/Report_Vulnerability_in_Myanmar_HARP-MIMU_Jun2018_ENG_Print_version.pdf) for more details. 

According to MIMU-HARP, the result of this grouping is a "lens allowing the most vulnerable to be considered more methodically in policy and programme planning". By using these 8 typologies as a reference, it is possible to understand how the pattern of conflict has changed between 2015 and 2021. 

<br>

```{r table-bands-comparison}
fs_pin %>%  
  mutate(old_conflict_score = 1 - old_conflict_score, 
         count = 1, 
         above_conflict_mean = ifelse(conflict_env >= 0.1261084, 1, 0),
         above_conflict_mean_old = ifelse(old_conflict_env >= 0.01504872, 1, 0)) %>%
  group_by(band_text) %>% 
  summarise(`2019_vulnerability` = mean(mdp_adjust),         
            old_conflict_score = mean(old_conflict_env), 
            conflict_score = mean(conflict_env),
            tsp_count = n(),
            tsp_above_mean_2015 = sum(above_conflict_mean_old == 1), 
            tsp_above_mean_2021 = sum(above_conflict_mean == 1)) %>%
  mutate(pc_above_mean_2015 = round(tsp_above_mean_2015 / tsp_count * 100, digits = 2), 
         pc_above_mean_2021 = round(tsp_above_mean_2021 / tsp_count * 100, digits = 2)) %>% 
  select(-tsp_above_mean_2015, -tsp_above_mean_2021) %>% 
  mutate_at(vars(conflict_score, old_conflict_score, `2019_vulnerability`), ~ round(.x, digits = 3)) %>%
  rename(vulnerability_band = band_text, 
         `2015_score` = old_conflict_score, 
         `2022_score` = conflict_score, 
         `%_>avg_2015` = pc_above_mean_2015, 
         `%_>avg_2021` = pc_above_mean_2021) %>% 
  kable(caption = "Changes in conflict patterns between 2015 and 2021, by vulnerabilty band") %>% 
  kable_classic_2("striped") %>% 
  footnote("Data source: ACLED (accleddata.com) and MIMU; higher scores indicate more vulnerability/conflict", general_title = "") %>% 
  add_header_above(c(" " = 2, "conflict score" = 2, "townships above avg conflict score" = 3))



```

<br>


Conflict, once much more concentrated in underdeveloped frontier areas in the operational areas of Ethnic Armed Organisations, is now much more pronounced in bands 3, 6 and 8, which contain major population centres and secondary cities and towns. Hubs in conflict-affected areas have the highest average 2021 conflict scores. 

Although conflict has persisted or even increased in many of the frontier and remote areas such as Laukkaing, Mongyai and Tangyan (all from band 1), the relative rankings of these areas have changed and they now form a much smaller share of unionwide conflict than they did in 2015. These townships tend to have much lower conflict scores in spite of their high vulnerability, indicating that whilst they remain development priorities, they should fall outside the caseload for humanitarian action. It is recommended that new vulnerability bands be re-developed, as they have proved very useful; but that is outside the scope of this document.  

<br><br>

### 2.5 Clustering townships

In recognition of the different contexts present in Myanmar (and the consequent need for different programming options), a simple K-means clustering was conducted on the townships to split them into prioritisation groups based on their 2021 conflict score, their 2019 vulnerability score and their population density. 

This clustering separates all 330 townships into five groups. The plots below show the spread of townships by prioritisation group across the 2021 conflict score, 2019 vulnerability score and population density. 

<br>

```{r patchwork-clusters}

reverselog_trans <- function(base = exp(1)) {
    trans <- function(x) -log(x, base)
    inv <- function(x) base^(-x)
    trans_new(paste0("reverselog-", format(base)), trans, inv, 
              log_breaks(base = base), 
              domain = c(1e-100, Inf))
}

fs_pin %>%  
  ggplot(aes(x = mdp_adjust, y = conflict_score)) + 
  geom_point(aes(colour = cluster, size = population_2021_proj), alpha = .8) + 
  guides(colour = "none", size = "none") +
  labs(x = "Multidimensional vulnerability (2019)", y = "Conflict score (2021)") + 
  scale_colour_viridis_d(option = "cividis") + 
  
fs_pin %>% 
  ggplot(aes(x = population_density, y = conflict_score)) + 
  geom_point(aes(colour = cluster, size = population_2021_proj), alpha = .8) + 
  scale_x_continuous(trans = reverselog_trans(10), breaks = c(0, 1, 10, 100, 1000, 10000), 
                     labels = comma_format(accuracy = 1)) + 
  scale_size_continuous(labels = comma) + 
  labs(size = "population", 
       x = "Persons per km2 (reversed)", y = "Conflict score (2021)") + 
  scale_colour_viridis_d(option = "cividis") + 

  plot_layout(widths = 1) + 
  plot_annotation(title = "Comparison between vulnerability, population density and conflict score", 
                  subtitle = "Higher vulnerabilty and conflict scores indicate more vulnerability and conflict", 
                  caption = "Data source: ACLED (acleddata.com) and MIMU", 
                  theme = theme(plot.caption = element_text(hjust = .5)))


```

<br>

It can be seen that groups A1 and A2, which have the highest conflict scores, are quite distinct from group D (where the majority of the development caseload resides). Group C has neither high vulnerability nor high conflict incidence. And group B consists of solely urban centres. The table below provides more detail on each of the groups. 

Groups A1 and A2 contained 60% of all conflict events and 83% of all conflict fatalities in 2021. Groups A1 and A2 can be distinguished from each other by the intensity of conflict, with A1 being where the concentrations of conflict events and fatalities are the heaviest. 

An interactive version of the plot on the left can be found below. 

<br> 

```{r table-group-summaries}
fs_pin %>%   
  mutate(total_events = battles + explosions_remote_violence + protests_and_riots + strategic_developments +
           violence_against_civilians) %>%
  replace_na(list(fatalities = 0)) %>% 
  group_by(group = cluster) %>% 
  summarise(conflict_events = sum(total_events, na.rm = TRUE),
            fatalities = sum(fatalities), 
            vulnerability_2019 = mean(mdp_adjust),
            avg_conflict_score = mean(conflict_score, na.rm = TRUE), 
            ppl_km2 = round(mean(population_density), digits = 0), 
            population = sum(population_2021_proj), 
            townships = n()) %>% 
  mutate(`%_total_population` = round(population / sum(population) * 100, digits = 2),
         `%_fatalities` = round(fatalities / sum(fatalities) * 100, digits = 2), 
         `%_conflict_events` = round(conflict_events / sum(conflict_events) * 100, digits = 2)) %>% 
  select(group, `%_conflict_events`, `%_fatalities`, avg_conflict_score, vulnerability_2019, ppl_km2, 
         townships, `%_total_population`) %>% 
  mutate_at(vars(avg_conflict_score, vulnerability_2019), ~round(.x, digits = 3)) %>%
  kable(caption = "Summary statistics by prioritisation group", format.args = list(big.mark = ",")) %>% 
  kable_classic_2("striped") %>% 
  footnote(footnote_as_chunk = TRUE, threeparttable = TRUE, 
           general = "Groups A1 and A2 have the highest conflict scores and should be prioritised over the others. Higher scores indicate more vulnerability/conflict.", general_title = "") 

# %>%
#  save_kable(file = "summary_stats_groups.png", zoom = 2)
```
<br>

Groups A1 and A2 both have middling vulnerability scores, but have much higher average conflict scores. Group A1, in particular, has a very high concentration of conflict incidents and fatalities, in addition to having the second-highest vulnerability scores of the groups. These 69 townships (containing about 24% of the population) are clear priorities for humanitarian action. 

<br>

```{r plotly-group-scatter}
group_scatter <- fs_pin %>%  
  mutate(total_events = battles + explosions_remote_violence + protests_and_riots + strategic_developments +
           violence_against_civilians) %>%
  filter(conflict_score != 0) %>%
  ggplot(aes(x = mdp_adjust, y = conflict_score, colour = cluster, 
             text = paste0(township, ", ", "\n",
                           state, "\n",
                           "group: ", cluster, "\n",
                           "fatalities: ", fatalities, "\n",
                           "conflict_score: ", round(conflict_score, digits = 3), "\n", 
                           "vul_score: ", round(mdp_adjust, digits = 3), "\n", 
                           "population: ", round(population_2021_proj, digits = 0), "\n", 
                           "partners_2021: ", partners_2021))) + 
  geom_hline(aes(yintercept = median(conflict_score)), colour = "red", lty = 2) + 
  geom_vline(aes(xintercept = median(mdp_adjust)), colour = "red", lty = 2) +
  geom_point(aes(size = population_2021_proj), alpha = .7) +
  scale_colour_viridis_d(option = "cividis") + 
  labs(x = "Multidimensional vulnerability (2015)", y = "Conflict score (2021)", 
       colour = "group", size = "", 
       title = "Multidimensional vulnerability and conflict by township", 
       subtitle = "Averages marked by red lines; population marked by size")

ggplotly(group_scatter, tooltip = c("text"), width = 820, height = 550) %>% 
  # layout(showlegend = FALSE, legend = list(font = list(size = 6))) %>% 
  config(displayModeBar = FALSE) %>% 
  layout(title = list(text = paste0("Multidimensional vulnerability and conflict by prioritisation group",
                                    "<br>",
                                    "<sup>",
                                    "Averages marked by red lines; population marked by size","</sup>")))


```


<br><br>

### 2.6 Linking geographic prioritisation and beneficiary selection 

Using the classifications from the decision tree in section 1.3, we can see that in rural areas, the highest proportions of people who are in the priority group are those who have been affected by conflict, followed by those who were affected by conflict and have lost work or employment opportunities. Rural areas severely-affected by conflict (the townships in groups A1 and part of A2) are the areas in which we may reach populations with the highest proportions of households in the priority group.

<br>

```{r table-priority-hhds-more}
survey %>% 
  mutate(priority_adj = priority * combined_wt) %>% 
  group_by(rural, shocks_conflict, shocks_lostwork) %>% 
  summarise(priority = sum(priority_adj), 
            total_pop = sum(combined_wt)) %>% 
  mutate(pc = round(priority / total_pop * 100, digits = 2)) %>% 
  mutate_at(vars(rural, shocks_conflict, shocks_lostwork), ~recode(., `0` = "no", `1` = "yes")) %>% 
  filter(pc > 40) %>%
  select(-priority, -total_pop) %>% 
  rename(`%_Priority` = pc,
         Rural = rural, 
         Shocks_Conflict = shocks_conflict, 
         Shocks_Lost_Work = shocks_lostwork) %>% 
  kable(caption = "Where are the highest proportions of persons in priority groups?") %>% 
  kable_classic_2("striped", full_width = FALSE) %>% 
  column_spec(1:4, width = "8em") %>% 
  footnote("Data sources: ACLED; acleddata.com and FAO-WFP", general_title = "")

```



<br> 

In urban areas (largely group B, and part of A2), however, conflict incidence is not as important of a determining factor -- the best results can be achieved by targeting people who have lost work or employment opportunities, irrespective of whether or not they have been affected by conflict.  

The table below shows the characteristics of the population groups that have the lowest concentrations of priority households.  

<br>

```{r table-priority-hhds-less}
survey %>% 
  mutate(priority_adj = priority * combined_wt) %>% 
  group_by(rural, shocks_conflict, shocks_lostwork) %>% 
  summarise(priority = sum(priority_adj), 
            total_pop = sum(combined_wt)) %>% 
  mutate(pc = round(priority / total_pop * 100, digits = 2)) %>% 
  mutate_at(vars(rural, shocks_conflict, shocks_lostwork), ~recode(., `0` = "no", `1` = "yes")) %>% 
  filter(pc <= 40) %>%
  select(-priority, -total_pop) %>% 
  rename(`%_Priority` = pc,
         Rural = rural, 
         Shocks_Conflict = shocks_conflict, 
         Shocks_Lost_Work = shocks_lostwork) %>% 
  kable(caption = "Which areas have less people in the priority group?") %>% 
  kable_classic_2("striped", full_width = FALSE) %>% 
  column_spec(1:4, width = "8em") %>% 
  footnote("Data sources: ACLED; acleddata.com and FAO-WFP", general_title = "")
```

<br>

As mentioned earlier, rural households were found to be less resilient and more asset-poor in IFPRI's household welfare survey. These findings align well with the process the Food Security Cluster has developed here to identify priority households (where rural households are significantly more likely to fall into the priority group). It would be extremely fruitful to explore whether this alignment extends to conflict and some of the other environmental and socioeconomic variables that have been employed in this document. 

It should also be noted that townships themselves are quite large -- on average, each has a population of `r summarise(fs_pin, mean = round(mean(population_2021_proj), digits = 0)) %>%  pull() %>% format(big.mark = ",")` persons. In areas where partners are not yet present, this might necessitate an intermediate step to help partners identify specific areas where they could begin working. 

In the ACLED dataset, of the `r filter(acled, year == 2021 & sub_event_type != "Peaceful protest") %>%  nrow() %>%  format(big.mark = ",")` events in 2021, 38% of them were recorded with specific village locations. 

<br>

```{r table-villages}
acled %>% 
  filter(year == 2021 & sub_event_type != "Peaceful protest") %>%
  mutate(has_village = fct_relevel(has_village, c("yes", "no"))) %>% 
  group_by(has_village) %>% 
  summarise(events = n(), 
            fatalities = sum(fatalities),
            locations = n_distinct(interaction(admin3, location)), .groups = "drop") %>%
  mutate(`%_events` = round(events / sum(events) * 100, digits = 2), 
         `%_fatalities` = round(fatalities / sum(fatalities) * 100, digits = 2)) %>% 
  select(mentions_vilage = has_village, events, `%_events`, fatalities, `%_fatalities`, locations) %>% 
  kable(caption = "Conflict events with and without villages", format.args = list(big.mark = ","), 
        table.attr = "style = 'width:60%;'") %>% 
  kable_classic_2() %>% 
  footnote("Data source: ACLED; accleddata.com", general_title = "")


```

<br>

Below is an interactive reference table of the 1,917 villages identified in the ACLED dataset, complete with coordinates. While this list does provide an excellent start, by working in these areas, partners should also endeavour to identify the specific locations of the remaining 62% of conflict events. 

<br>

```{r dt-conflict-villages}
acled %>%
  mutate(has_village = fct_relevel(has_village, c("yes", "no"))) %>%
  filter(year == 2021 & sub_event_type != "Peaceful protest" & has_village == "yes") %>%
  group_by(state = admin1, township = admin3, village = location) %>% 
  mutate(event_count = 1) %>% 
  summarise(events = n(), 
            fatalities = sum(fatalities), 
            battles = sum(event_count[event_type == "Battles"]), 
            remote_violence = sum(event_count[event_type == "Explosions/Remote violence"]), 
            violence_v._civilians = sum(event_count[event_type == "Violence against civilians"]),
            latitude = max(latitude),
            longitude = max(longitude)) %>%
  filter(village %out% c("Mandalay", "Yangon")) %>% 
  arrange(desc(fatalities)) %>% 
  datatable(filter = list(position = "top", clear = FALSE), 
            options = list(pageLength = 10, scrollX = TRUE, 
                           headerCallback = DT::JS("function(thead) {", 
                          "  $(thead).css('font-size', '.8em');",
                          "}")), 
            caption = htmltools::tags$caption(style = 'caption-side: top;
                                              text-align: center;
                                              color: black; font-size: 120%;',
                                              "Reference table -- conflict-affected villages")) %>% 
  formatStyle(0, target = "row", lineHeight = "80%", fontSize = "80%")

```

<br><br>

### 2.6a Sample beneficiary selection process

Though partners are likely to already have their own beneficiary targeting procedures, based on WFP's guidelines, an example targeting procedure is listed below, with reference to the information presented in this document. 

**Sample targeting process**

* Identification and selection of geographical area 
  - Selection of township, from the relevant prioritisation group
  - Coordination with the Cluster and Cluster partners to ensure that there are no duplications and that areas served are in line with the severity and magnitude of humanitarian needs
  - Ensure that programming is reflective of the context of the area targeted -- rural or urban, conflict-affected or not 
  
* Set up complaints and feedback mechanism to ensure that issues and concerns raised by community members are addressed efficiently. The complaints and feedback mechanism will be active throughout this entire process. 

* Enumeration and headcount of community members

* Formation of beneficiary targeting committee which is representative of the community

* Presentation of draft targeting criteria, for example (selected from variables most predictive of food insecurity from section 1.1 and decision trees):
  - Households which have lost work of employment opportunities
  - Households which have suffered from natural hazards
  - Households which have been affected by conflict, insecurity or violence 
  - Households which rely on casual, non-agricultural labour for income 
  - Households which have no income source 
  - Households whose head have no educational attainment
  - Households with children under 5 
  - Households whose members include persons with disabilities 
  
* Sensitisation and consultation on beneficiary selection criteria and finalisation of selection criteria

* Drafting of beneficiary list -- only households which meet the established criteria should be included

* Public posting of initial beneficiary list, for public comment, typically for a maximum of 5 days 

* Spot checks and verification on a random selection of selected households 

* Beneficiary registration 


<br><br>

### 2.7 5W results from the first quarter of 2022

In the first quarter of 2022, Food Security Cluster partners have reached a total of `r sum(fsc$new_beneficiaries) %>% format(big.mark = ",")` persons or `r round(sum(fsc$new_beneficiaries) / sum(fs_pin$fs_targeted) * 100, digits = 2)`% of the 2022 target. Below is an examination of the extent to which partners have targeted the townships most affected by conflict. 

In this first plot, the number of beneficiaries reached in the first quarter of 2022 are plotted against the targeted population. Each point is a township and the red line down the middle represents reaching 100% of the target. How far above or below a township is indicates how far above or below the target it is. Additionally, the township prioritisation group each township belongs to is marked by the colour. 

The townships on the far left of the plot have beneficiaries despite not having targets for 2022 (their targets have been nominally coded as $1$ so they appear on the plot). 

<br>

```{r plotly-tsp-reached-groups}
ben_target <- fsc %>% 
  group_by(admin3_pcode_old) %>% 
  summarise(beneficiaries = sum(new_beneficiaries),
            partners = n_distinct(org_code)) %>% 
  right_join(fs_pin %>% 
               rename(beneficiaries_2022 = beneficiaries), 
             by = c("admin3_pcode_old" = "admin3_pcode")) %>% 
  replace_na(list(beneficiaries = 0)) %>% 
  mutate(reached_pc = beneficiaries / fs_targeted,
         reached_pc = ifelse(is.infinite(reached_pc), 1, reached_pc),
         fs_targeted = ifelse(fs_targeted == 0 & beneficiaries > 0, 1, fs_targeted),
         fs_targeted = round(fs_targeted, digits = 0)) %>% 
  arrange(desc(reached_pc)) %>% 
  select(state, township, fs_pin, fs_targeted, beneficiaries, reached_pc, partners, cluster) %>%
  mutate(pc_target = round(beneficiaries / fs_targeted * 100, digits = 2)) %>% 
  ggplot(aes(x = fs_targeted, y = beneficiaries, colour = cluster, 
             text = paste0(township, ",", "\n",
                           state, ",", "\n",
                           "beneficiaries_2022: ", beneficiaries, "\n",
                           "target_2022: ", fs_targeted, "\n",
                           "%_reached: ", pc_target, "\n", 
                           "partners_2022: ", partners, "\n",
                           "group: ", cluster))) + 
  geom_abline(intercept = 0, slope = 1, lty = 2, colour = "red") + 
  geom_point(aes(size = beneficiaries), alpha = 0.8) +
  scale_size_continuous(guide = "none") + 
  scale_x_continuous(trans = "log10", labels = comma) + 
  scale_y_continuous(trans = "log10", labels = comma) +
  scale_colour_viridis_d(option = "cividis") +
  # scale_colour_manual(values = c("#575C6DFF", "#00204DFF", "#C4B56CFF", "#FFEA46FF")) +
  labs(y = "Beneficiaries", x = "Targeted population", colour = "Group", 
       title = "Targeted population per township vs beneficiaries reached in Q1 2022, by prioritisation group",
       subtitle = "The red line is 100% of target")

ggplotly(ben_target, tooltip = c("text"), width = 820) %>% 
  config(displayModeBar = FALSE) %>% 
  layout(title = list(text = 
                        paste0("Targeted population per township vs beneficiaries reached in Q1 2022",
                                    "<br>",
                                    "<sup>",
                                    "By prioritisation group; the red line is 100% of target; size is beneficiaries reached","</sup>")))
```

<br>

With reference to the table below, 2.8% of beneficiaries came from group A1 and 10.7% of beneficiaries came from group A2. On the surface, this seems like partners have made effort to reach conflict-affected townships. However, this reach has largely been due to oversubscription in Sittwe, where the number of beneficiaries reached in 152% of the target. 

The development of the prioritisation groups also brings up the broader point of whether or not cluster targets are in line with needs and if they should be reformulated based on the information now available, as the targets in groups B and C are noticeably higher than those in group A. Nevertheless, it is hoped that partners will be able to afford townships in groups A1 and A2 greater coverage as the year progresses. 

<br>

```{r table-reached-group}
fsc %>% 
  group_by(admin3_pcode_old) %>% 
  summarise(beneficiaries = sum(new_beneficiaries),
            partners = n_distinct(org_code)) %>% 
  right_join(fs_pin %>%  select(-beneficiaries), by = c("admin3_pcode_old" = "admin3_pcode")) %>% 
  replace_na(list(beneficiaries = 0)) %>%
  mutate(gap = ifelse(fs_targeted - beneficiaries <= 0, 0, fs_targeted - beneficiaries),
         reached_pc = 1 - gap / fs_targeted, 
         fs_targeted = ifelse(fs_targeted == 0 & beneficiaries > 0, 1, fs_targeted),
         has_beneficiaries = ifelse(beneficiaries > 0, 1, 0)) %>% 
  group_by(cluster) %>% 
  summarise(beneficiaries = sum(beneficiaries, na.rm = TRUE), 
            fs_targeted = sum(fs_targeted, na.rm = TRUE), 
            gap = sum(gap, na.rm = TRUE), 
            tsp_reached = sum(has_beneficiaries), 
            tsp_total = n()) %>% 
  mutate(pc_gap = round(gap / fs_targeted * 100, digits = 2), 
         pc_ben = round(beneficiaries / sum(beneficiaries) * 100, digits = 2)) %>% 
  mutate_at(vars(fs_targeted, gap), ~ round(.)) %>% 
  select(group = cluster, beneficiaries, `%_ben` = pc_ben, target = fs_targeted, 
         gap, `%_gap` = pc_gap, tsp_reached, tsp_total) %>% 
  kable(caption = "2022 Q1 beneficiaries and percent reached by prioritisation group", 
        format.args = list(big.mark = ",")) %>% 
  kable_classic_2("striped") %>% 
  footnote("Any reach above 100% is counted as 100%; exceeding the target in one township does not affect other townships", 
           general_title = "")

```

<br>


Overall, these are not encouraging results. Notably, only `r filter(fsc, state == "Sagaing") %>% {sum(.$new_beneficiaries)} %>% format(big.mark = ",")` beneficiaries being reached in the whole of Sagaing, where the fighting has been heaviest. It is recommended that targets and plans for the Food Security Cluster be reviewed, and partners be reminded to reallocate resources away from oversubscribed areas and away from group C, which are neither humanitarian nor development priorities.

<br><br>

### 2.8 Maps of conflict scores and prioritisation group

```{r map-conflict-score, fig.height=9}

fs_pin %>% 
  right_join(pcode3_shape, by = "admin3_pcode") %>% 
  st_as_sf() %>% 
  ggplot() +
  geom_sf(aes(fill = conflict_score), size = 0.1) + 
  geom_sf(data = pcode1_shape, alpha = 0, colour = "black", size = 0.5) + 
  scale_fill_viridis_c(option = "magma", direction = -1) +
  theme_void() + 
  labs(title = "Conflict score by township 2021",
       caption = "Data source: ACLED; acleddata.com",
       fill = "Conflict\nscore") +
  theme(plot.caption=element_text(hjust = 0.5), 
        plot.background = element_rect(fill = "white", colour = NA)) +

fs_pin %>% 
  left_join(pcode3_shape, by = "admin3_pcode") %>% 
  st_as_sf() %>% 
  ggplot() + 
  geom_sf(aes(fill = cluster), size = 0.1, colour = "gray20") + 
  geom_sf(data = pcode1_shape, alpha = 0, colour = "black", size = 0.5) +
  # scale_fill_manual(values = c("#575C6DFF", "#00204DFF", "#C4B56CFF", "#FFEA46FF")) +
  scale_fill_viridis_d(option = "cividis") +
  theme_void() +
  labs(title = "Townships by prioritisation group", 
       fill = "Priori-\ntisation\ngroup",
       caption = "Data source: ACLED; acleddata.com and FSC calculations") +
  theme(plot.caption=element_text(hjust = 0.5), 
        plot.background = element_rect(fill = "white", colour = NA))  
  
# ggsave("conflict_score_map.png", dpi = 300, height = 12, width = 7, units = "in")
```

<br><br>

### 2.9 Reference table for conflict variables

Below is an interactive reference table for the various types of conflict events by township. It also includes the overall conflict score and prioritisation groups. The search bar can be used to find specific townships, or any of the columns may be sorted according to ascending or descending values. The table currently shows townships in descending order of conflict score. 

<br>

```{r dt-reference-table}
fs_pin %>% 
  select(state, township, pop_2021 = population_2021_proj, ppl_km2 = population_density, 
         PIN = fs_pin, target = fs_targeted,
         group = cluster, vulnerability = mdp_adjust, conflict_score, fatalities,
         battles, explosions_remote_violence, protests_and_riots, strategic_developments, 
         violence_v._civilians = violence_against_civilians,
         admin3_pcode) %>% 
  arrange(desc(conflict_score)) %>% 
  datatable(filter = list(position = "top", clear = FALSE),
            options = list(pageLength = 10, scrollX = TRUE, 
                           headerCallback = DT::JS("function(thead) {", 
                          "  $(thead).css('font-size', '.8em');",
                          "}")),
            caption = htmltools::tags$caption(style = "caption-side: top;
                                              text-align: center; font-size: 140%;",
                                              "2021 conflict indicators by township")) %>% 
  formatRound(c("pop_2021", "ppl_km2", "target", "PIN"), digits = 0) %>% 
  formatRound(c("conflict_score", "vulnerability"), digits = 3) %>% 
  formatStyle(0, target = "row", lineHeight = "80%", fontSize = "80%")
 
```

<br><br><br>


## 3. Distribution of flood risk in Myanmar


### 3.1 Historical flood data 

In light of the impending monsoon season, the probability that a township will be affected by a major flood or cylconic event has been calculated. Major floods since 2008 have been factored into this calculation. 

For the moment, conflict incidence and flood and cyclone risk will be evaluated separately. Flood and storm surge risk exist as probabilities for the moment and are intended to support prepositioning and Disaster Risk Reduction. This might change were severe flooding to occur in 2022. 

<br>

```{r barplot-flood-risk}
floods_storm_surge %>% 
  select(starts_with("year_"), admin3_pcode) %>% 
  pivot_longer(cols = c(-admin3_pcode), names_to = "floods", values_to = "value") %>% 
  mutate(floods = str_remove_all(floods, "year_")) %>% 
  group_by(floods) %>% 
  filter(value == TRUE) %>% 
  summarise(townships_affected = n()) %>% 
  mutate(floods = fct_rev(floods)) %>% 
  ggplot(aes(x = townships_affected, y = floods, fill = floods)) + 
  geom_col() +  
  theme(legend.position = "none", 
        plot.caption = element_text(hjust = 0.5)) +
  labs(y = "", 
       x = "Number of townships affected", 
       title = "Townships affected by floods and storm surges (2008-2021)",
       caption = "Data source: MIMU and UNDP")

 # ggsave("by_floods.png", dpi = 300, height = 5, width = 8, units = "in")
```

<br>

Based on this data, a score was calculated for each township based on how many times it had been affected by floods since 2008. The table below also summarises the number of people in need (2022). `r fs_pin %>% filter(flood_count > 4) %>% {sum(.$fs_pin)} %>% round() %>% format(big.mark = ",")` people live in townships that have flooded more than 5 times since 2008. 

```{r}
fs_pin %>% 
  replace_na(list(flood_count = 0)) %>% 
  # mutate(flood_count = ifelse(flood_count > 4, "5-9", flood_count)) %>% 
  group_by(flood_count) %>% 
  summarise(townships = n(),
            people_in_need = round(sum(fs_pin), digits = 0)) %>% 
  arrange(desc(flood_count)) %>% 
  kable(caption = "Summary statistics by number of floods (2008-2021)", format.args = list(big.mark = ",")) %>% 
  kable_classic_2("striped", full_width = FALSE) %>% 
  column_spec(1:3, width = "8em") %>% 
  footnote("Data source: MIMU and UNDP", general_title = "")

```

<br><br>

### 3.2 Map of flood risk 

The map below shows the probability of each township being affected by floods. The areas with the greatest risk of flooding are in Mon, near the mouth of the Sittaung River and the Gulf of Mottama and those along the Ayeyarwady River, and to a lesser extent, along the Chindwin River. 

<br>

```{r map-floods, fig.height = 12}

fs_pin %>% 
  left_join(pcode3_shape, by = "admin3_pcode") %>% 
  st_as_sf() %>% 
  ggplot() +
  geom_sf(size = 0.1, aes(fill = flood_prob)) +
  geom_sf(data = pcode1_shape, size = 0.5, alpha = 0, colour = "gray20") +
  scale_fill_viridis(option = "plasma", direction = -1, label = percent_format(accuracy = 1)) +
  theme_void() +
  theme(plot.background = element_rect(fill = "white", colour = NA),
        plot.caption = element_text(hjust = 0.5)) +
  labs(title = "Flood risk in Myanmar (2008-2021)", 
       fill = "flood\nrisk", 
       caption =  "Data source: MIMU and UNDP")

# ggsave("flood_risk.png", dpi = 300, height = 12, width = 7, units = "in")
```

<br><br>

### 3.3 Reference table for flood risk 

Below is an interactive reference table for flood risk by township. It includes the number of times since 2008 a township has been affected by flooding (flood_count) and the probability of flooding (flood_risk). Similar to the interactive table in the previous chapter, the search bar can be used to find specific townships and any of the columns may be sorted according to ascending or descending values. The table currently shows townships sorted in descending order of flood risk.

<br>


```{r}
fs_pin %>% 
  select(state, township, pop_2021 = population_2021_proj, ppl_km2 = population_density, 
         PIN = fs_pin, target = fs_targeted,
         flood_risk = flood_prob, flood_count,
         admin3_pcode) %>% 
  arrange(desc(flood_risk)) %>% 
  datatable(filter = list(position = "top", clear = FALSE),
            options = list(pageLength = 10, scrollX = TRUE),
            caption = htmltools::tags$caption(style = "caption-side: top;
                                              text-align: center; font-size: 140%;",
                                              "Flood risk by township")) %>% 
  formatRound(c("pop_2021", "ppl_km2", "target", "PIN"), digits = 0) %>% 
  formatRound(c("flood_risk"), digits = 2)
 
```

<br><br><br>


## 4. Technical notes

These annexes contain additional technical information that informed the decisions in the earlier sections. 


### 4.1 Limitations and next steps 

<br>

#### FAO-WFP food security survey 

The most important limitation of the FAO-WFP survey was the exclusion of several key states and regions from the survey. Of particular interest are Sagaing, Magway and Mandalay where the conflict has been particularly intense. 

Furthermore, the dry zone was not surveyed. From an agricultural perspective, this is a major omission as the diversity of crops and, consequently, diets are much higher in the dry zone than in the other parts of the country, which are predominantly focused on paddy.

Additionally, the targeting process proposed in this document has not yet been trialled in the field. The Food Security Cluster does not have the resources to undertake a field test of this scale. However, every attempt has been made to corroborate the data presented in it. 

In spite of these major limitations and the numerous assumptions that have had to made, the FAO-WFP survey is most comprehensive dataset on food security that has been collected so far. Additional efforts will be made to cross-reference these data from those of other surveys. These models will be updated once the third round of the FAO-WFP survey is ready. As a final point in this section, the FAO-WFP survey, in spite of its limitations forms the basis of the People in Need calculations, which underpins a lot of the response. 

This paper serves a proposal on how vulnerable households (such as those in the priority group) may be identified and targeted. Should this methodology prove sound and viable, it is suggested that it be applied to either IFPRI's Household Welfare Survey as well as data collected in the third round of the FAO-WFP food security survey. 

<br><br>

#### MIMU-HARP vulnerability analysis 

The results of the 2015 [MIMU-HARP Vulnerability Analysis](http://themimu.info/vulnerability-in-myanmar) (used to inform scores for multidimensional vulnerability), have been updated using the 2019 Inter-Censal Survey results using the following formula:  

$$
2019TspValue = 2015TspValue / 2015DistrictValue \cdot 2019DistrictValue
$$

This allows the new township values to tally with the 2019 district-level inter-censal survey results as well as to preserve the order and relationships of townships within each district. For the two indicators in the 2015 dataset but not covered in the 2019 inter-censal survey, they were forward filled, using their 2015 values. To further improve these estimations, multiple imputation should be employed. But that will be left for any subsequent revisions to this document. 

<br><br>

#### Conflict data and ACLED

Perhaps the most key limitation has also been the lack of field access and detailed assessment data from many parts of the country. With the conflict ongoing, and the footprints of humanitarian agencies largely skewed towards Yangon and Rakhine, which have been comparatively less affected by the current crisis, there is a demonstrable dearth of first-hand information in key areas. This document has intended to circumvent this through the use of ACLED data, which is the most complete set of conflict incident data in Myanmar. 

However, the ACLED dataset is not without its limitations -- the majority of its information, about 85 percent, comes from subnational, national and international media sources. The remainder comes from ACLED’s partner, the Myanmar Peace Monitor, and reports from UN agencies, international monitoring groups, and local human rights organisations. The completeness of the conflict data and how representative it is of the situation on the ground is not something that is easily verifiable. Though it should be noted that ACLED still has the largest and most comprehensive dataset of conflict incidents in Myanmar. 

<br><br>

#### Targeting limitations 

The targeting processes proposed in this document have not yet been field-tested. Whilst the Food Security Cluster is confident of its analysis, it does not have the resources to undertake a field trial to understand how what has been proposed here will impact operations. Access to affected areas has also not been considered. 

Finally, the prioritisation process is not entirely seamless -- whilst it is possible to prioritise from the union-level down to the township level and from the village down to the household with few issues, there remains a gap between the township and the village. In large part, this is due to the dearth of data available at the village-tract and village levels. The Food Security Cluster has partially mitigated this through the provision of village names listed in the ACLED dataset, but this is also a partial list and not fully matched at the village-tract levels either. Unless there is a full enumeration of village tracts and villages in Myanmar, this issue is likely to persist. 


<br><br>

### 4.2 Food insecurity score

Food insecurity here is not recognised solely as a food secure/food insecure dichotomy and is instead treated as a matter of degree determined by an household's position the overall distribution of food security indicators. According to Betti and Verma, "by appropriately weighting non-monetary indicators of deprivation to reflect their dispersion and correlation, it is possible to construct quantitative indices of deprivation in its various dimensions". At its most basic, the food insecurity score is a weighted sum of its component indicators that rewards variables that are responsible for more of the variation in the dataset. 

To calculate the Food Insecurity Score, variables within the Livelihood Coping Strategies Index, Food Consumption Score and Food Insecurity Experience scale were rescaled to values between $0$ and $1$. For most of the variables, $1$ meant "yes" and $0$ meant "no"; however, for the Food Consumption Score indicators, $0$ meant that a household had consumed that food group every day for the past 7 days and $1$ meant that a household had not consumed that food group for any of the prior 7 days. Additionally, coping strategies indicators at the "crisis" level were coded as $0.5$ for "yes" and indicators at the "emergency" level were coded as $1$ for "yes". This is to replace the typical severity weighting that would normally have occurred but was not part of the FAO-WFP survey process. 

These calculations have been encoded in the R package **`mdepriv`** by Attilio and Aldo Benini, building on work done by Alperin and Van Kerm on their **`mdepriv`** Stata package. According to Alperin and Van Kerm, "**`mdepriv`** creates synthetic scores of deprivation that are linear functions of uni-dimensional deprivation items measured on the [0, 1] scale. With $x_{ij}$ denoting the value of a particular deprivation items $j$ for observation $i$. The synthetic scores calculated by **`mdepriv`** are weighted sums of the $M$ deprivation items: 

$$
s_i = \sum_{j = 1}^{M}{w_j}x_{ij}
$$

A food insecurity score was then calculated using the weighted sum of the individual variables. The weighting scheme chosen was the default for the **`mdepriv`** package, Cerioli and Zani (1990) weighting. Under this weighting scheme, variable weights are "frequency-based" and are a function of their sample mean but just slightly modified to give a stronger weight to relatively rare items so that variation is rewarded: 

$$
w_j \propto log(\frac{1}{\bar{x}_j})
$$
<br>

The specific code for generating the food insecurity score was:

```{r echo = TRUE}
food_insecurity_score <- survey %>% 
  mutate_at(vars(cs_crisis_sold_prod_assets, cs_crisis_no_school, 
                 cs_crisis_reduced_health_exp), ~ . * 0.5) %>% 
  select(survey_id, hhfcs_inv, # inverse of food consumption score
         cs_crisis_sold_prod_assets, cs_crisis_no_school, 
         cs_crisis_reduced_health_exp, cs_emergency_sold_house, 
         cs_emergency_hh_migration, cs_emergency_hh_risk,
         fies_fewfood, fies_skipped, fies_whlday, fies_hungry, 
         fies_ateless, fies_healthy, fies_runout, fies_worried, combined_wt) %>% 
  mdepriv(c("hhfcs_inv", 
         "cs_crisis_sold_prod_assets", "cs_crisis_no_school", 
         "cs_crisis_reduced_health_exp", "cs_emergency_sold_house", 
         "cs_emergency_hh_migration", "cs_emergency_hh_risk",
         "fies_fewfood", "fies_skipped", "fies_whlday", "fies_hungry", 
         "fies_ateless", "fies_healthy", "fies_runout", "fies_worried"), 
         output = "all", sampling_weights = "combined_wt")

```


<br><br>

### 4.3 Limited performance of linear regression models

Prior to the development of the decision trees detailed in section 1.3, a linear regression model had been considered. However, no matter how sophisticated the method, the highest r-squared achieved was 0.136 (a model with an r-squared with 1 would mean it was perfectly predictive and an r-squared of 0 means that a model does not predict any of the variance at all; in this case, the best linear regression model still left 86% of the variance unexplained). 

With reference to the plot below of predicted values of the food security score vs the actual values (of the best performing model), whilst there is a significant relationship, its predictive performance is too weak. This was why decision trees were employed instead of a linear model. Each point in the plot below is a household in the survey. 

<br>

```{r}
ols <- survey %>% select(fs_score4, hoh_female, rural, not_improved_sanitation, 
         contains("edu_"), contains("shocks_"), contains("income_ms"), survey_id) %>% 
  left_join(survey %>% select(survey_id, hoh_age) %>%
              mutate(value = 1) %>%
              pivot_wider(names_from = hoh_age, values_from = value, values_fill = 0) %>%
              select(-`NA`), by = "survey_id") %>% 
  select(-survey_id) %>%
  #slice(-c( 69, 297,  660, 1186, 1222, 1256, 1354, 1362, 1720, 
  #          1820, 1823, 1843, 1936, 1950, 1952, 2003, 2132, 2329, 2594)) %>% 
  drop_na() %>% 
  mutate(fs_score4 = log(fs_score4), 
         fs_score4 = ifelse(is.infinite(fs_score4), 0, fs_score4)) %>% 
  lm(fs_score4 ~ . + hoh_female:age_over65, data = .) 

survey %>% 
  ggplot(aes(x = predict(ols), y = log(fs_score4))) + 
  geom_point(alpha = .5) + 
  geom_abline(intercept = 0, slope = 1, lty = 2, colour = "red") + 
  labs(x = "Log of predicted values", y = "Log of actual values", 
       title = "Predicted vs actual values for the food insecurity score")


```

<br>

The relatively low r-squared is likely due to missing variables in the FAO-WFP survey. For instance, the factors that go into a household's decision to reduce their healthcare expenditures are too multifarious to be captured by a survey on food security. 

This pattern was also true for the overall food security score as well as the individual food security indices (Food Consumption Score, Food Insecurity Experience Scale and Coping Strategies Index).

For additional reference, the table below shows the top 15 variables ranked by the percentage of variance of the food security score that they explain. 


```{r table-pc-variance}
ols %>% 
  anova() %>% 
  tidy() %>% 
  mutate(pc_variance = sumsq / sum(sumsq)) %>% 
  arrange(desc(pc_variance)) %>% 
  select(term, pc_variance) %>% 
  mutate(pc_variance = round(pc_variance * 100, digits = 2)) %>% 
  head(15) %>% 
  kable(caption = "Variables ranked by percent of variance") %>% 
  column_spec(1:2, width = "8em") %>% 
  kable_classic_2("striped", full_width = FALSE)

```

<br> 

As a  point of reference, below is the code for the decision tree. 

```{r eval = FALSE, echo = TRUE}
tree_environment_circumstances <- survey %>% 
  rpart(priority ~  shocks_lostwork + shocks_sicknessdeath + shocks_foodprices + 
          shocks_conflict + shocks_cantworkbusiness + shocks_naturalhazard + 
          shocks_accessmarket + shocks_pricesother + shocks_other + 
          rural + not_improved_sanitation,
        data = .,  weights = combined_wt , cp = 0.01)

tree_demographics <- survey %>% 
  mutate(children_0_4 = ifelse(children_0_4 > 0, 1, 0)) %>% 
  rpart(priority ~ children_0_4 + hoh_female + hoh_education + hoh_age, 
        data = ., weights = combined_wt, cp = .0088)
```

<br><br>

### 4.4 Pairwise correlations of indicators in the FAO-WFP survey

The table below shows the top 20 correlations between the different variables collected by the FAO-WFP food security survey. When a particular variable is of interest, it may be useful and interesting to see which other variables it is most correlated with. 

The vast majority of the variables, with the exception of composite scores and indices, are binary, with 1 being true and 0 being false. In these cases, a correlation of 1 would mean that a variable is true for all the households in which its match is also true; a correlation of 0.5 means that a variable is true for 50% of the households in which its match is true. 

For instance, whether or not the head of the household has competed education (*edu_higher*) is most correlated with households whose main income source is stable non-agricultural income (*income_ms_stable_non_ag*). Similarly, a correlation of 0.175 between *rural* and *edu_none* indicates that in 17.5% of all rural households, the head of household has no formal educational attainment. It also should be noted that, as expected, the components of each of the food security indices are most correlated with each other. 

<br>

```{r}
survey_long %>%
  group_by(var) %>% 
  mutate(count = ifelse(value > 0, 1, 0),
         count = sum(count)) %>% 
  ungroup() %>% 
  filter(count > 50) %>% # adjust the threshold n
  pairwise_cor(var, survey_id, value = value, sort = TRUE) %>% 
  rename(variable = item1, match = item2) %>% 
  group_by(variable) %>% 
  top_n(abs(correlation), n = 20) %>% 
  filter(variable %out% c("lhcsi_range", "lhcsi_max", "fies_raw_range", "fies_raw", 
                      "fs_score", "fs_score2", "fs_score3", "fs_score4",
                      "csi_weighted", "priority", "hhfcs_inv")) %>% 
  filter(match %out% c("lhcsi_range", "lhcsi_max", "fies_raw_range", "fies_raw", 
                      "fs_score", "fs_score2", "fs_score3", "fs_score4",
                      "csi_weighted", "priority", "hhfcs_inv")) %>%
  datatable(filter = list(position = "top", clear = FALSE),
            options = list(pageLength = 10, scrollX = TRUE,
                           search = list(regex = TRUE))) %>% 
  formatRound(c("correlation"), digits = 3)
```

<br><br>

### 4.5 Calculating the conflict score

The conflict score here originally appeared in the Food Security Cluster's report [Understanding Conflict Dynamics in Myanmar through Conflict and Incident Data: A Food Security Perspective](https://food-security-cluster-myanmar.github.io/exploratory-data-analysis-acled-fsc/). It was calculated using ACLED data and yields a score for each township. 

The conflict score is an update of the conflict index in the [MIMU-HARP Vulnerability Analysis](http://themimu.info/vulnerability-in-myanmar), using 2021 data. The specific conflict variables that included in the score were battles, explosions and remote violence, non-peaceful protests and riots, conflict-related fatalities, strategic developments and number of IDPs. Only the version of the conflict score within this document takes IDPs into account; in the Food Security Cluster's earlier report on conflict (linked in the previous paragraph), the Cluster did not yet have access to detailed IDP data. 

The conflict score is an average of the normalised values of key conflict indicators. Its main use it to aid decisions about geographic prioritisation. These normalised values have been re-weighted with Betti-Verma method, which penalises redundancy and rewards variation; this is the only notable divergence from the MIMU-HARP methodology. The Betti-Verma method was employed through the **`mdepriv`** package developed by Attilio and Aldo Benini.

Unlike the Food Insecurity Score, the component variables of the conflict index were not binary, meaning that it was possible to take advantage of the Betti-Verma method's double-weighting rule which is sensitive to both the relative frequency of the variables and the correlation amongst variables: 

$$
w_j \propto (w_j^a \cdot w_j^b)
$$

However, it must be noted that whilst the scores themselves can be shared and used, replicating all the calculations will necessitate obtaining permission from [UNHCR](mailto:<Nii Ako Sowa>sowanii@unhcr.org?cc=myayaim@unhcr.org&subject=Myanmar%20IDP%20data) as their township-level breakdown of IDP populations have not been shared publicly. The specific code for calculating the conflict score can be found below. 

<br>

```{r eval=FALSE, echo=TRUE}

# Betti-Verma calculations and the construction of the conflict score
index_shares2 <- conflict_df2 %>%   
  mutate_at(vars(c(battles, explosions_remote_violence, violence_against_civilians, 
                   fatalities, strategic_developments, 
                   protests_and_riots, total_idps)), 
           scale) %>%  
  mutate_at(vars(c(battles, explosions_remote_violence, violence_against_civilians, 
                   fatalities, strategic_developments, 
                   protests_and_riots, total_idps)), 
           funs((. - min(., na.rm = T))/(max(., na.rm = T) - min(., na.rm = T)))) %>% 
  mdepriv(c("battles", "explosions_remote_violence", 
            "violence_against_civilians", "fatalities", 
            "strategic_developments", "protests_and_riots", "total_idps"),
          # IDP counts have been used in the score, but will not be read 
          # into this report
          method = "bv", output = "all")

```

